preg_match returns identical elements only once - php

I am going through a string and proces all the elements between !-- and --!. But only unique elements are processes. When I have !--example--! and a bit further in the text also !--example--!, the second one is ignored.
This is the code:
while ($do = preg_match("/!--(.*?)--!/", $formtext, $matches)){
I know about preg_match_all, but need to do this with preg_match.
Any help? Thanks in advance!

You'll want PHP to look for matches only after the previous match. For that, you'll need to capture string offsets using the PREG_OFFSET_CAPTURE flag.
Example:
$offset = 0;
while (preg_match("/!--(.*?)--!/", $formtext, $match, PREG_OFFSET_CAPTURE, $offset))
{
// calculate next offset
$offset = $match[0][1] + strlen($match[0][0]);
// the parenthesis text is accessed like this:
$paren = $match[1][0];
}
See the preg_match documentation for more info.

Use preg_match_all
edit: some clarification yields:
$string = '!--example--! asdasd !--example--!';
//either this:
$array = preg_split("/!--(.*?)--!/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(5) {
[0]=>
string(0) ""
[1]=>
string(7) "example"
[2]=>
string(10) " asdasd "
[3]=>
string(7) "example"
[4]=>
string(0) ""
}
//or this:
$array = preg_split("/(!--(.*?)--!)/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(7) {
[0]=>
string(0) ""
[1]=>
string(13) "!--example--!"
[2]=>
string(7) "example"
[3]=>
string(10) " asdasd "
[4]=>
string(13) "!--example--!"
[5]=>
string(7) "example"
[6]=>
string(0) ""
}

while ($do = preg_match("/[!--(.*?)--!]*/", $formtext, $matches)){
Specify the * at the end of the pattern to specify more than one. They should both get added to your $matches array.

Related

PHP Regex Facebook Video ID

I have my facebook urls below (which are all facebook videos) and I want to get its id.
https://mbasic.facebook.com/TrendingInPhilippinesOfficial/videos/1722369168023859/
https://mbasic.facebook.com/story.php?story_fbid=1722369168023859&id=1388211471439632
Output must be:
1. 1388211471439632
2. 1388211471439632
I used this regex to get the ID.
preg_match("~/videos/(?:t\.\d+/)?(\d+)~i", $_GET['url'], $matches);
echo $matches[1];
well it works at #1 but at #2 it doesn't work.
Any solution into this?
I'm guessing you want one regex for both link?
$link1 = "https://mbasic.facebook.com/TrendingInPhilippinesOfficial/videos/1722369168023859/";
$link2 = "https://mbasic.facebook.com/story.php?story_fbid=1722369168023859&id=1388211471439632";
$regex = '/(videos|story_fbid)(\/|=)(\d+)(\/|&)?/';
preg_match($regex, $link1, $matches);
preg_match($regex, $link2, $matches2);
Note the ? at the end of the regex, which will allow to parse it without the trailing / or the &. If you want to only parse the id when there's both, remove the question mark from the regex.
The var_dump of $matches would be:
array(5) {
[0]=>
string(24) "videos/1722369168023859/"
[1]=>
string(6) "videos"
[2]=>
string(1) "/"
[3]=>
string(16) "1722369168023859"
[4]=>
string(1) "/"
}
And the var_dump of $matches2 would be:
array(5) {
[0]=>
string(28) "story_fbid=1722369168023859&"
[1]=>
string(10) "story_fbid"
[2]=>
string(1) "="
[3]=>
string(16) "1722369168023859"
[4]=>
string(1) "&"
}
To get parameters from an URL you can use parse_url & parse_str functions.
parse_str(parse_url($link2)['query'], $array);
print_r($array);
Output
Array
(
[story_fbid] => 1722369168023859
[id] => 1388211471439632
)

How to parse column separated key-value text with possible multiline strings

I need to parse the following text:
First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value
Value is a string OR multiline string, at the same time value could contain "key: blablabla" substring. Such subsctring should be ignored (not parsed as a separate key-value pair).
Please help me with regex or other algorithm.
Ideal result would be:
$regex = "/SOME REGEX/";
$matches = [];
preg_match_all($regex, $html, $matches);
// $mathes has all key and value parsed pairs, including multilines values
Thank you.
I tried with simple regexes but result is incorrect, because I don't know how to handle multilines:
$regex = "/(.+?): (.+?)/";
$regex = "/(.+?):(.+?)\n/";
...
You can do it with this pattern:
$pattern = '~(?<key>[^:\s]+): (?<value>(?>[^\n]*\R)*?[^\n]*)(?=\R\S+:|$)~';
preg_match_all($pattern, $txt, $matches, PREG_SET_ORDER);
print_r($matches);
You can sort of do it, as long as you consider a single word followed by a colon at the start of a line to be a new key start:
$data = 'First: 1
Second: 2
Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
Fourth: value';
preg_match_all('/^([a-z]+): (.*?)(?=(^[a-z]+:|\z))/ims', $data, $matches);
var_dump($matches);
This gives the following result:
array(4) {
[0]=>
array(4) {
[0]=>
string(10) "First: 1
"
[1]=>
string(11) "Second: 2
"
[2]=>
string(86) "Multiline: blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(13) "Fourth: value"
}
[1]=>
array(4) {
[0]=>
string(5) "First"
[1]=>
string(6) "Second"
[2]=>
string(9) "Multiline"
[3]=>
string(6) "Fourth"
}
[2]=>
array(4) {
[0]=>
string(3) "1
"
[1]=>
string(3) "2
"
[2]=>
string(75) "blablablabla
bla2bla2bla2
bla3b and key: value in the middle if strting
"
[3]=>
string(5) "value"
}
[3]=>
array(4) {
[0]=>
string(7) "Second:"
[1]=>
string(10) "Multiline:"
[2]=>
string(7) "Fourth:"
[3]=>
string(0) ""
}
}

PHP preg_match get content between

<!--:en-->Apvalus šviestuvas<!--:-->
<!--:ru-->Круглый Светильник<!--:-->
<!--:lt-->Round lighting<!--:-->
I need get the content between <!--:lt--> and <!--:-->
I have tried:
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('<!--:lt-->+[a-zA-Z0-9]+<!--:-->$', $string, $match);
var_dump($match);
Something is wrong with the syntax and logic. How can I make this work?
preg_match("/<!--:lt-->([a-zA-Z0-9 ]+?)<!--:-->/", $string, $match);
added delimiters
added a match group
added ? to make it ungreedy
added [space] (there is a space in Round lighting)
Your result should be in $match[1].
A cooler and more generic variation is:
preg_match_all("/<!--:([a-z]+)-->([^<]+)<!--:-->/", $string, $match);
Which will match all of them. Gives:
array(3) { [0]=> array(3) { [0]=> string(37) "Apvalus šviestuvas" [1]=> string(53) "Круглый Светильник" [2]=> string(32) "Round lighting" } [1]=> array(3) { [0]=> string(2) "en" [1]=> string(2) "ru" [2]=> string(2) "lt" } [2]=> array(3) { [0]=> string(19) "Apvalus šviestuvas" [1]=> string(35) "Круглый Светильник" [2]=> string(14) "Round lighting" } }
Use this Pattern (?<=<!--:lt-->)(.*)(?=<!--:-->)
<?php
$string = "<!--:en-->Apvalus šviestuvas<!--:--><!--:ru-->Круглый Светильник<!--:--><!--:lt-->Round lighting<!--:-->";
preg_match('~(?<=<!--:lt-->)(.*)(?=<!--:-->)~', $string, $match);
var_dump($match);

regex breaking Chinese string

When i run this code and similar some Chinese the ni (你) character (maybe others) gets chopped of and broken.
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\s,]+/", $sample);
var_dump($parts);
//outputs
array(4) {
[0]=>
string(2) "�"
[1]=>
string(9) "不喜欢"
[2]=>
string(6) "香蕉"
[3]=>
string(3) "吗"
}
//in 我觉得 你很 麻烦
//out
array(4) {
[0]=>
string(9) "我觉得"
[1]=>
string(2) "�"
[2]=>
string(3) "很"
[3]=>
string(6) "麻烦"
}
Is my regex wrong?
If your string is in UTF-8, you must use the u modifier:
$sample = "你不喜欢 香蕉 吗";
$parts = preg_split("/[\\s,]+/u", $sample);
var_dump($parts);
If it's in another encoding, see unicornaddict's answer.
Since the input string is multi-byte, I guess you'll have to use mb_split in place of preg_split.

Using a regular expression to match each individual character as it's own group?

In PHP I'm trying to match each character as its own group. Which would mimic the str_split(). I tried:
$string = '123abc456def';
preg_match_all('/(.)*/', $string, $array);
// $array = array(2) {
// [0]=> array(2) {
// [0]=> string(12) "123abc456def"
// [1]=> string(0) "" }
// [1]=> array(2) { [0]=> string(1) "f" [1]=> string(0) "" } }
I was expecting something like:
//$array = array(2) {
// [0]=> string(12) "123abc456def",
// [1]=> array(12) {
// [0]=> string(1) "1", [1]=> string(1) "2"
// ...
// [10]=> string(1) "e", [11]=> string(1) "f" } }
The reason I want to use the regular expression instead of a str_split() is because the regex will be the basis of another regex.
The * outside the parens means you want to repeat the capturing group. (This means you will only capture the last iteration.) Try a global match of any single character, like this:
preg_match_all('/(.)/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[0];
Try this:
preg_match_all('/./s', $str, $matches)
This does also match line break characters.
Maybe this is what you are looking for:
preg_match_all('/(.)+?/', $string,
$array);
i don't know if this helps for this station but you can access letters of a string like array.. so
<?php
$a = "hede";
print $a[0] . "\n";
print $a[1] . "\n";
print $a[2] . "\n";
print $a[3] . "\n";
?>
will output
h
e
d
e

Categories