Why does preg_match_all() create the same answer multiple times? - php

The following code extracts #hashtags from a tweet and puts them in the variable $matches.
$tweet = "this has a #hashtag a #badhash-tag and a #goodhash_tag";
preg_match_all("/(#\w+)/", $tweet, $matches);
var_dump( $matches );
Can someone please explain to me why the following results have 2 identical arrays instead of just 1?
array(2) {
[0]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
[1]=>
array(3) {
[0]=>
string(8) "#hashtag"
[1]=>
string(8) "#badhash"
[2]=>
string(13) "#goodhash_tag"
}
}

Because you use () to catch the sub group.
Try:
preg_match_all("/#\w+/", $tweet, $matches);

Why are you using () unless you want it to do exactly that. lol Sorry, that came out not so friendly :(
http://php.net/manual/en/function.preg-match-all.php Example 3

its simple :
remove () from your expression
Hope it helps.

Related

PHP Regex Facebook Video ID

I have my facebook urls below (which are all facebook videos) and I want to get its id.
https://mbasic.facebook.com/TrendingInPhilippinesOfficial/videos/1722369168023859/
https://mbasic.facebook.com/story.php?story_fbid=1722369168023859&id=1388211471439632
Output must be:
1. 1388211471439632
2. 1388211471439632
I used this regex to get the ID.
preg_match("~/videos/(?:t\.\d+/)?(\d+)~i", $_GET['url'], $matches);
echo $matches[1];
well it works at #1 but at #2 it doesn't work.
Any solution into this?
I'm guessing you want one regex for both link?
$link1 = "https://mbasic.facebook.com/TrendingInPhilippinesOfficial/videos/1722369168023859/";
$link2 = "https://mbasic.facebook.com/story.php?story_fbid=1722369168023859&id=1388211471439632";
$regex = '/(videos|story_fbid)(\/|=)(\d+)(\/|&)?/';
preg_match($regex, $link1, $matches);
preg_match($regex, $link2, $matches2);
Note the ? at the end of the regex, which will allow to parse it without the trailing / or the &. If you want to only parse the id when there's both, remove the question mark from the regex.
The var_dump of $matches would be:
array(5) {
[0]=>
string(24) "videos/1722369168023859/"
[1]=>
string(6) "videos"
[2]=>
string(1) "/"
[3]=>
string(16) "1722369168023859"
[4]=>
string(1) "/"
}
And the var_dump of $matches2 would be:
array(5) {
[0]=>
string(28) "story_fbid=1722369168023859&"
[1]=>
string(10) "story_fbid"
[2]=>
string(1) "="
[3]=>
string(16) "1722369168023859"
[4]=>
string(1) "&"
}
To get parameters from an URL you can use parse_url & parse_str functions.
parse_str(parse_url($link2)['query'], $array);
print_r($array);
Output
Array
(
[story_fbid] => 1722369168023859
[id] => 1388211471439632
)

php regex for detecting #number

i have the following regex that i am trying to detect #x, x being a number. I was able to get it working when there is nothing around match 2, however if there is then it breaks. can someone help me with how to make this work both ways?
/(\G|\s+|^)#(\d+)((?=\s+)|(?=::)|$)/i
that will work with the line
This is a test #1234 end test
but that will not work with
This is a test #1234end test
This is a test#1234 end test
This is a test.#1234 end test
This is a test #1234. End test
anyone know what needs to be changed to achieve this?
edit, i am trying to allow anything but alphanumeric in the 3rd group, right now there is :: and whitespace. is there a way to combine these into 1 and not detect letters or numbers
Running a preg match using /#\d+/i should get you what you are looking for. So running the following:
$items = [
"This is a test #1234end test",
"This is a test#1234 end test",
"This is a test.#1234 end test",
"This is a test #1234. End test"
];
foreach($items as $test){
preg_match("/#\d+/i", $test, $matches);
var_dump($matches);
}
You will get this result:
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
array(1) {
[0]=>
string(5) "#1234"
}
If you don't want the # in the results, then you can then do a subpattern of /#(\d+)/i
Which will then result in the following:
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
array(2) {
[0]=>
string(5) "#1234"
[1]=>
string(4) "1234"
}
(\G|\s+|^)#(\d+)((?=[^[:alnum:]])|$)
i wanted to keep the three groups that i had, but i only changed the 3rd group. i removed the :: and \S whitespace characters from the 3rd group and just added a simple NOT alphanumeric check, as this will contain those 2 conditions as well.
(\G|\s+|^)
#(\d+)
((?=[^[:alnum:]])|$)
[^[:alnum:]]

PHP - REGEX TO ARRAY like MP3TAG

I would like to ask how to convert a string to array using
a string pattern like mp3tag does
%ALBUM% - %SOMETHING% - %SOMETHING%,
the ' - ' are custom chars that are not static.
If i didnt made myself clear
i want fro custom sting to make it an array
but the pattern is custom not static
Is this possible in php and if so how.
$str = "%ALBUM% & %SOMETHING% (ノ゜-゜)ノ ︵ ┬──┬ %SOMETHING%,";
preg_match_all("/%([a-z]+)%/i", $str, $matches);
var_dump($matches);
Outputs
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "%ALBUM%"
[1]=>
string(11) "%SOMETHING%"
[2]=>
string(11) "%SOMETHING%"
}
[1]=>
array(3) {
[0]=>
string(5) "ALBUM"
[1]=>
string(9) "SOMETHING"
[2]=>
string(9) "SOMETHING"
}
}

Regex quantified capture

php > preg_match("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
string(20) "/m/part/other-part/t"
[1]=>
string(11) "/other-part"
}
php > preg_match_all("#/m(/[^/]+)+/t/?#", "/m/part/other-part/t", $m);
php > var_dump($m);
array(2) {
[0]=>
array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=>
array(1) {
[0]=>
string(11) "/other-part"
}
}
With said example I would like the capture to match both /part and /other-part, unfortunately with regex /m(/[^/]+)+/t/? doesn't capture both, as I expect.
This capture should not be bound to only match this sample, it should capture an undefined number of repetitions of the capture group; e.g. /m/part/other-part/and-another/more/t
UPDATE:
Given that this is expected behavior my question stands as of how I would be able to achieve this matching of mine?
Try this one out:
preg_match_all("#(?:/m)?/([^/]+)(?:/t)?#", "/m/part/other-part/another-part/t", $m);
var_dump($m);
It gives:
array(2) {
[0]=>
array(3) {
[0]=>
string(7) "/m/part"
[1]=>
string(11) "/other-part"
[2]=>
string(15) "/another-part/t"
}
[1]=>
array(3) {
[0]=>
string(4) "part"
[1]=>
string(10) "other-part"
[2]=>
string(12) "another-part"
}
}
//EDIT
IMO the best way to do what you want is to use preg_match() from #stema and explode result by / to get list of parts you want.
Thats the way capturing groups are working. repeated capturing groups have only the last match stored after the regex finished. Thats in your test "/other-part".
Try this instead
/m((?:/[^/]+)+)/t/?
See it here on Regexr, while hovering over the match, you can see the content of the capturing group.
Just make your group non-capturing by adding a ?: at the start and put another one around the whole repetition.
In php
preg_match_all("#/m((?:/[^/]+)+)/t/?#", "/m/part/other-part/t", $m);
var_dump($m);
Output:
array(2) {
[0]=> array(1) {
[0]=>
string(20) "/m/part/other-part/t"
}
[1]=> array(1) {
[0]=>
string(16) "/part/other-part"
}
}
As already written in a comment, you can't do this at once because preg_match does not allow you to return the same subgroup matches as well (like you can do with Javascript or .Net, see Get repeated matches with preg_match_all()). So you can divide the operation onto multiple steps:
Match the subject, extract the part you're interested in.
Match the interested part only.
Code:
$subject = '/m/part/other-part/t';
$subpattern = '/[^/]+';
$pattern = sprintf('~/m(?<path>(?:%s)+)/t/?~', $subpattern);
$r = preg_match($pattern, $subject, $matches);
if (!$r) return;
$r = preg_match_all("~$subpattern~", $matches['path'], $matches);
var_dump($matches);
Output:
array(1) {
[0]=>
array(2) {
[0]=>
string(5) "/part"
[1]=>
string(11) "/other-part"
}
}

preg_match returns identical elements only once

I am going through a string and proces all the elements between !-- and --!. But only unique elements are processes. When I have !--example--! and a bit further in the text also !--example--!, the second one is ignored.
This is the code:
while ($do = preg_match("/!--(.*?)--!/", $formtext, $matches)){
I know about preg_match_all, but need to do this with preg_match.
Any help? Thanks in advance!
You'll want PHP to look for matches only after the previous match. For that, you'll need to capture string offsets using the PREG_OFFSET_CAPTURE flag.
Example:
$offset = 0;
while (preg_match("/!--(.*?)--!/", $formtext, $match, PREG_OFFSET_CAPTURE, $offset))
{
// calculate next offset
$offset = $match[0][1] + strlen($match[0][0]);
// the parenthesis text is accessed like this:
$paren = $match[1][0];
}
See the preg_match documentation for more info.
Use preg_match_all
edit: some clarification yields:
$string = '!--example--! asdasd !--example--!';
//either this:
$array = preg_split("/!--(.*?)--!/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(5) {
[0]=>
string(0) ""
[1]=>
string(7) "example"
[2]=>
string(10) " asdasd "
[3]=>
string(7) "example"
[4]=>
string(0) ""
}
//or this:
$array = preg_split("/(!--(.*?)--!)/",$string,-1,PREG_SPLIT_DELIM_CAPTURE);
var_dump($array);
array(7) {
[0]=>
string(0) ""
[1]=>
string(13) "!--example--!"
[2]=>
string(7) "example"
[3]=>
string(10) " asdasd "
[4]=>
string(13) "!--example--!"
[5]=>
string(7) "example"
[6]=>
string(0) ""
}
while ($do = preg_match("/[!--(.*?)--!]*/", $formtext, $matches)){
Specify the * at the end of the pattern to specify more than one. They should both get added to your $matches array.

Categories