php regex doesn't seem to work as expected - php

String:
https://fakedomain.com/2017/07/01/the-string-i-want-to-get/
Code:
$url = 'https://fakedomain.com/2017/07/01/the-string-i-want-to-get/';
$out = [];
preg_match('\/\d{4}\/\d{2}\/\d{2}(.*)', $url, $out);
// At this point $out is empty...
// Also... I tried this (separately)
$keywords = preg_split("\/\d{4}\/\d{2}\/\d{2}(.*)", $url);
// also $keywords is empty...
I've tested the regex externally and it works. I want to split out the /the-string-i-want-to-get/ string. What am I doing wrong?

I would not use a regex. In this case it's better to use parse_url and some other helpers like trim and explode.
<?php
$url = 'https://fakedomain.com/2017/07/01/the-string-i-want-to-get/';
$parsed = parse_url($url);
$Xploded = explode('/',trim($parsed['path'],'/'));
print $Xploded[count($Xploded)-1];
// outputs: the-string-i-want-to-get

There's a function for that:
echo basename($url);

preg_split
Split string by a regular expression. Split the given string by a regular expression.
Your $url will be split by the dates. That's not the way you need to do:
<?php
$url = 'https://fakedomain.com/2017/07/01/the-string-i-want-to-get/';
$out = [];
preg_match('/\/\d{4}\/\d{2}\/\d{2}(.*)/', $url, $out);
// See here...
var_dump($out);
You will get an array of two elements:
array(2) {
[0]=>
string(37) "/2017/07/01/the-string-i-want-to-get/"
[1]=>
string(26) "/the-string-i-want-to-get/"
}

Related

Extract particular point of URL in PHP

I'm trying to get a very specific part of a URL using PHP so that I can use it as a variable later on.
The URL I have is:
https://forums.mydomain.com/index.php?/clubs/11-Default-Club
The particular part I am trying to extract is the 11 part between the /clubs/ and -Default-Club bits.
I was wondering what the best way to do this was. I've seen examples on here that use a regex-esque parser but I can't wrap my head around it for this particular instance.
Thanks
Edit; this is what I've tried so far using an explode query, but it seems to give me all sorts of elements which are not present in the URL above:
$url = $_SERVER['REQUEST_URI'];
$url = explode('/', $url);
$url = array_filter($url);
$url = array_merge($url, array());
Which returns:
Array ( [0] => index.php?app=core&module=system&controller=widgets&do=getBlock&blockID=plugin_9_bimBlankWidget_dqtr03ssz&pageApp=core&pageModule=clubs&pageController=view&pageArea=header&orientation=horizontal&csrfKey=8e19769b95c733b05439755827a98ac8 )
If you expect that the string with dashes (11-Default-Club) will be always at the end you can try this:
$url = $_SERVER['REQUEST_URI'];
$urlParts = explode('/', $url);
$string = end($urlParts);
$stringParts = explode('-', $string);
$theNumber = $stringParts[0]; // this will be 11
I'd rather be explicit:
<?php
$url = 'https://forums.mydomain.com/index.php?/clubs/11-Default-Club';
$query = parse_url($url, PHP_URL_QUERY);
$pattern = '#^/clubs/(\d+)[a-zA-Z-]+$#';
$digits = preg_match($pattern, $query, $matches)
? $matches[1]
: null;
var_dump($digits);
Output:
string(2) "11"
If this URL structure is fix for all URLs in your site and you only want to get the integer/number/digit part of the URL:
<?php
$url = 'https://forums.mydomain.com/index.php?/clubs/11-Default-Club';
$int = (int) filter_var($url, FILTER_SANITIZE_NUMBER_INT);
echo $int;
If this url structure is fix for all URLs in your site then below is best way to get your value.
<?php
$url = "https://forums.mydomain.com/index.php?/clubs/11-Default-Club";
$url = explode('/', $url);
$url = array_filter($url);
$end = end($url);
$end_parts = explode('-',$end);
echo $end_parts[0];
Output:
11

convert string (JS object format) to PHP object

How I can convert string from URL to PHP object ? I know that its not valid JSON format but I still don't know how to convert it properly.
The following code returns NULL:
$url = 'https://cdn.shopify.com/s/javascripts/currencies.js';
$decodedCurrencies = json_decode(file_get_contents($url));
var_dump($decodedCurrencies);
You may use regex to extract rates array and then decode it.
Something like this:
$url = 'https://cdn.shopify.com/s/javascripts/currencies.js';
$script = file_get_contents($url);
$matches = [];
preg_match('/.+(\{.+}).+/', $script, $matches);
$decodedCurrencies = json_decode($matches[1]);
var_dump($decodedCurrencies);
Output:
object(stdClass)#1 (179) {
["USD"]=>
float(1)
["EUR"]=>
float(1.1637)
["GBP"]=>
float(1.31291)
["CAD"]=>
float(0.778138)
["ARS"]=>
float(0.0566433)
["AUD"]=>
float(0.766026)
["BRL"]=>
float(0.303944)
["CLP"]=>
float(0.00157516)
...
}

Remove last child page from URI

How can one dynamically find and remove the last child of a website path URI?
Code: $uri = $_SERVER["REQUEST_URI"];
Result: http://192.168.0.16/wordpress/blog/page-2/
Desired result: http://192.168.0.16/wordpress/blog/
Many thanks in advance!
you can use this and you can get your required output:
// implode string into array
$url = "http://192.168.0.16/wordpress/blog/page-2/";
//then remove character from right
$url = rtrim($url, '/');
// then explode
$url = explode('/', $url);
// remove the last element and return an array
json_encode(array_pop($url));
// implode again into string
echo implode('/', $url);
another approach is:
// implode string into array
$url = explode('/', 'http://192.168.0.16/wordpress/blog/page-2/');
//The array_filter() function filters the values of an array using a callback function.
$url = array_filter($url);
// remove the last element and return an array
array_pop($url);
// implode again into string
echo implode('/', $url);
$url = 'http://192.168.0.16/wordpress/blog/page-2/';
// trim any slashes at the end
$trim_url = rtrim($url,'/');
// explode with slash
$url_array = explode('/', $trim_url);
// remove last element
array_pop($url_array);
// implade with slash
echo $new_url = implode('/', $url_array);
Output:
http://192.168.0.16/wordpress/blog
The correct way would be to use parse_url() and dirname(), which will also support query params. You could explode $uri['path'] but its unnecessary in this case.
<?php
// explode the uri in its proper parts
$uri = parse_url('/wordpress/blog/page-2/?id=bla');
// remove last element
$path = dirname($uri['path']);
// incase you got query params, append them
if (!empty($uri['query'])) {
$path .= '?'.$uri['query'];
}
// string(22) "/wordpress/blog?id=bla"
var_dump($path);
See it working: https://3v4l.org/joJrF

php regex to get string inside href tag

I need a regex that will give me the string inside an href tag and inside the quotes also.
For example i need to extract theurltoget.com in the following:
URL
Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.html i only want http://www.mydomain.com/
Dont use regex for this. You can use xpath and built in php functions to get what you want:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//#href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
this expression will handle 3 options:
no quotes
double quotes
single quotes
'/href=["\']?([^"\'>]+)["\']?/'
Use the answer by #Alec if you're only looking for the base url part (the 2nd part of the question by #David)!
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
This will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html" class="myclass" rel="myrel
)
So you can use $href = $info["scheme"] . "://" . $info["host"]
Which gives you:
// http://www.mydomain.com
When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by #user2520237.
$html = 'URL';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);
this will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html
)
Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];
Which gives you:
// http://www.mydomain.com/page.html
http://www.the-art-of-web.com/php/parse-links/
Let's start with the simplest case - a well formatted link with no extra attributes:
/<a href=\"([^\"]*)\">(.*)<\/a>/iU
For all href values replacement:
function replaceHref($html, $replaceStr)
{
$match = array();
$url = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);
if(count($match))
{
for($j=0; $j<count($match); $j++)
{
$html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
}
}
return $html;
}
$replaceStr = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);
echo $replaceHtml;
This will handle the case where there are no quotes around the URL.
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
/href="(https?://[^/]*)/
I think you should be able to handle the rest.
Because Positive and Negative Lookbehind are cool
/(?<=href=\").+(?=\")/
It will match only what you want, without quotation marks
Array (
[0] => theurltoget.com )

How extract part of an URL in PHP to remove specific part?

So, I have this URL in a string:
http://www.domain.com/something/interesting_part/?somevars&othervars
in PHP, how I can get rid of all but interesting_part?
...
$url = 'http://www.domain.com/something/interesting_part/?somevars&othervars';
$parts = explode('/', $url);
echo $parts[4];
Output:
interesting_part
Try:
<?php
$url = 'http://www.domain.com/something/interesting_part/?somevars&othervars';
preg_match('`/([^/]+)/[^/]*$`', $url, $m);
echo $m[1];
You should use parse_url to do operations with URL. First parse it, then do changes you desire, using, for example, explode, then put it back together.
$uri = "http://www.domain.com/something/interesting_part/?somevars&othervars";
$uri_parts = parse_url( $uri );
/*
you should get:
array(4) {
["scheme"]=>
string(4) "http"
["host"]=>
string(14) "www.domain.com"
["path"]=>
string(28) "/something/interesting_part/"
["query"]=>
string(18) "somevars&othervars"
}
*/
...
// whatever regex or explode (regex seems to be a better idea now)
// used on $uri_parts[ "path" ]
...
$new_uri = $uri_parts[ "scheme" ] + $uri_parts[ "host" ] ... + $new_path ...
If the interesting part is always last part of path:
echo basename(parse_url($url, PHP_URL_PATH));
[+] please note that this will only work without index.php or any other file name before ?. This one will work for both cases:
$path = parse_url($url, PHP_URL_PATH);
echo ($path[strlen($path)-1] == '/') ? basename($path) : basename(dirname($path));
Here is example using parse_url() to override the specific part:
<?php
$arr = parse_url("http://www.domain.com/something/remove_me/?foo&bar");
$arr['path'] = "/something/";
printf("%s://%s%s?%s", $arr['scheme'], $arr['host'], $arr['path'], $arr['query']);

Categories