Extract all urls from preg_match_all - php

I'm working on my code to fetch the href urls from the variable $message after when I'm fetching the data from the database. I have got a problem with using preg_match_all to fetch the href tags from the variable because it will display the array in the output like twice.
Here is the output:
Array ( [0] => Array ( [0] => https://example.com/s-6?sub=myuserid [1] => https://example.com/s-6?sub=myuserid
[2] => https://example.com/s-6?sub=myuserid [3] => https://www.example2.com/1340253724 [4] => https://example.com/s-6?sub=myuserid ) )
It should be:
Array ( [0] => https://example.com/s-6?sub=myuserid [1] => https://example.com/s-6?sub=myuserid
[2] => https://example.com/s-6?sub=myuserid [3] => https://www.example2.com/1340253724 [4] => https://example.com/s-6?sub=myuserid ) )
Here is a minimal example:
<?php
$message = 'Click Here!
Watch The Video Here!
HERE
Example2.com/1340253724
Here';
//find the href urls from the variable
$regex = '/https?\:\/\/[^\" ]+/i';
preg_match_all($regex, $message, $matches);
print_r(matches);
?>
I have tried to use a different way like this:
foreach($matches as $url)
{
echo $url;
}
And also I have tried this:
foreach($matches as $url)
{
$urls_array[] = $url;
}
print_r($urls_array);
The results are still the same. I have tried to find the answer on google, but I can't find the answer for a solution.
Unfortunately, I am not be able to find the solution for this, because I have got no idea how I can fetch the href tags using preg_match_all to display the elements and store in the array.
The problem I have found that something have to do with the variable called $matches.
Can you please show me an example how I can use to fetch the href tags using with preg_match_all so I could store be able to store the elements in the array?
Thank you.

As wrote in documentation preg_match_all
$out[0] contains array of strings that matched full pattern, and
$out[ 1] contains array of strings enclosed by tags.
So you could do like following
foreach($matches[0] as $url)
{
echo $url;
}

Try this:
foreach($matches[0] as $url)
{
echo $url;
}

Hi,
as far as I correct understand your problem is that u received one to much nested array with results and you cant read yours URL that are also as array?
One of the solution that u can use is getting rid of unnecessary nested array. You can do this by using PHP Array function array_shift().
From php.net manual
array_shift() shifts the first value of the array off and returns it [...]
So the trick is that returned value will be your array with data through which you can loop.
A bit of sample with your case:
//from the moment when you use preg_match_all and have matches
preg_match_all($regex, $message, $matches);
$urls = array_shift($matches);
foreach($urls as $url) {
//do something with URL
}
Of course you can different use array_shift(), thats just a simple sample ;)
Cheers!

Related

how to prevent preg_match/preg_match_all from creating unnecessary array elements [duplicate]

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.
How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key
preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));
I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.
It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}
Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)
I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);
You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});
I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.
Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

How to pick numbers between underlines using regex?

I want to get only the value in bold, but I'm not getting.
349141_194419414_4828414_n.jpg
or
https:// hphotos-ash3.net/t1.0-9/1146_54482593153_1214114_n.jpg
Thank you already
You can use preg_match with a capture group to get the result:
<?php
$searchText = "349141_194419414_4828414_n.jpg";
$result = preg_match("/_(\\d+)_/u", $searchText, $matches);
print_r($matches[1]);
?>
output:
194419414
(i'm not sure whether this one is good method or not but you can get whatever value you want to by this)
$r="349141_194419414_4828414_n";
print_r(explode('_',$r));
output:
Array ( [0] => 349141 [1] => 194419414 [2] => 4828414 [3] => n )
$rr=explode('_',$r);
echo $rr[1];
output
194419414
Try something like this:
.+/\d+_(\d+)_\d+_n.jpg
Here's a regular expression answer.
$filename = '349141_194419414_4828414_n.jpg';
preg_match_all('/[0-9]+/', $filename, $matches);
echo $matches[0][1]; //194419414

preg_match and regex error while parsing string?

I need to scrap a part of a url using preg_match but I never got what I need.
Here the example:
$item = "http://example.com/0229883504/?r=2-OR1&p=1";
$item = preg_match_all("/href[^\"]+/i",$item,$matches);
print_r($matches)
I need to return this number
0229883504
I tried a lot but when I var_dump the matches array, it gives:
Array ( [0] => Array ( [0] => href= ) )
I know that the problem is within the pattern but I'm not so good in this part :)
This is the code you need:
$item = "http://example.com/0229883504/?r=2-OR1&p=1";
$item = preg_match_all("#http://.*?/(.*?)/.*#i",$item,$matches);
print_r($matches);
If you need to extract the value 0229883504, you can add these lines:
$result = $matches[1][0];
echo $result;
and it will work as you can see here: http://ideone.com/H2E9I
This will do the trick for example above.
preg_match_all("/http:\/\/example.com\/([a-zA-Z0-9]+)\//",$item,$matches)
or even better:
preg_match_all("/http:\/\/example.com\/(.+)\//",$item,$matches)
However if your domains can vary use the example of the code from Aurelio :).

preg_match post-id

I'm trying to get the post-id my posts.
this is the format it is in--> postid-785645
the numbers are randomly generated.
<?php
$mysite = "http://example.com";
$wholepage = #file_get_contents($mysite);
// I want to use preg_match to echo just the numbers. out of the whole page.
// Basically i'm trying to have is search the whole page for -> postid-(randomly generated number) and have it echo the number from things that are in this format.
?>
You could try using the following regular expression:
"/postid-[0-9]+/"
Example code:
$wholepage = 'foo postid-785645 bar postid-785646 baz';
$matches = array();
$string = preg_match_all("/postid-[0-9]+/", $wholepage, $matches);
print_r($matches);
Result:
Array
(
[0] => Array
(
[0] => postid-785645
[1] => postid-785646
)
)
I'm using preg_match_all instead of preg_match because preg_match stops after it finds the first match.

How to return only named groups with preg_match or preg_match_all?

Example:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i', $string, $arr_result);
print_r($arr_result);
Returns:
Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
But I want it to be:
Array
(
[date] => 2010-07-18
)
In PHP's PDO object there is an option that is filtering results from database by removing these duplicate numbered values : PDO::FETCH_ASSOC. But I haven't seen similar modifier for the PCRE functions in PHP yet.
How to return only named groups with preg_match or preg_match_all?
This is currently (PHP7) not possible.
You will always get a mixed type array, containing numeric and named keys.
Lets quote the PHP manual (http://php.net/manual/en/regexp.reference.subpatterns.php):
This subpattern will then be indexed in the matches array by its
normal numeric position and also by name.
To solve the problem the following code snippets might help:
1. filter the array by using an is_string check on the array key (for PHP5.6+)
$array_filtered = array_filter($array, "is_string", ARRAY_FILTER_USE_KEY);
2. foreach over the elements and unset if array key is_int() (all PHP versions)
/**
* #param array $array
* #return array
*/
function dropNumericKeys(array $array)
{
foreach ($array as $key => $value) {
if (is_int($key)) {
unset($array[$key]);
}
}
return $array;
}
Its a simple PHP function named dropNumericKeys(). Its for the post-processing of an matches array after a preg_match*() run using named groups for matching. The functions accepts an $array. It iterates the array and removes/unsets all keys with integer type, leaving keys with string type untouched. Finally, the function returns the array with "now" only named keys.
Note: The function is for PHP downward compatiblity. It works on all versions. The array_filter solution relies on the constant ARRAY_FILTER_USE_KEY, which is only available on PHP5.6+. See http://php.net/manual/de/array.constants.php#constant.array-filter-use-key
preg_match does not have any flag or option that it only returns named matches (yet). So what you want is not directly possible. However you can remove all items with non-fitting keys from your matches array and then you get what you're looking for:
$matches = array_intersect_key($matches, array_flip(array('name', 'likes')));
I do not think you can make preg_* do it, but you can do it with a simple loop. But I don't see why those elements pose a problem.
It also possible to unset all numeric indexes before return:
foreach (range(0, floor(count($arr_result) / 2)) as $index) {
unset($arr_result[$index]);
}
Similar to the answer that hakre posted above, I use this snippet to get just the named parameters:
$subject = "This is some text written on 2010-07-18.";
$pattern = '|(?<date>\d\d\d\d-\d\d-\d\d)|i';
preg_match_all($pattern, $subject, $matches, PREG_SET_ORDER);
echo '<pre>Before Diff: ', print_r($matches, 1), '</pre>';
$matches = array_diff_key($matches[0], range(0, count($matches[0])));
echo '<pre>After Diff: ', print_r($matches, 1), '</pre>';
...which produces this:
Before Array
(
[0] => Array
(
[0] => 2010-07-18
[date] => 2010-07-18
[1] => 2010-07-18
)
)
After Array
(
[date] => 2010-07-18
)
I read in your post that these are possible overloads of future memory etc ...
In this case, why no't can be solved with an unset():
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d{4}-\d{2}-\d{2})|i', $string, $arr_result);
$date = array("date" => $arr_result['date']);
unset($arr_result, $string);//delete array and string preg_match origen
print_r($date);
//or create a new:
// $arr_result = $date;
//print_r($arr_result);
You could use T-Regx and go with group() or namedGroups() which only returns named capturing groups.
<?php
$subject = "This is some text written on 2010-07-18.";
pattern('(?<date>\d\d\d\d-\d\d-\d\d)', 'i')->match($subject)->first(function ($match) {
$date = $match->get('date');
// 2010-07-18
$groups = $match->namedGroups();
// [
// 'date' => '2010-07-18'
// ]
});
I use some of introduced codes and this is the final code works on php 5.6+:
$re = '/\d+\r\n(?<start>[\d\0:]+),\d+\s--\>\s(?<end>[\d\0:]+),.*\r\nHOME.*\r\nGPS\((?<x>[\d\.]+),(?<y>[\d\.]+),(?<d>[\d\.]+)\)\sBAROMETER\:(?<h>[\d\.]+)/';
$str= file_get_contents($srtFile);
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
echo '<pre>';
$filtered=array_map(function ($d){
return $array_filtered = array_filter($d, "is_string", ARRAY_FILTER_USE_KEY);
},$matches);
var_dump($filtered);
if you are interested what it does it read position data from a str file that DJI drones generate while recording video.
Try this:
$string = "This is some text written on 2010-07-18.";
preg_match('|(?<date>\d\d\d\d-\d\d-\d\d)|i',$string,$arr_result);
echo $arr_result['date'];

Categories