Extract part from URL for a query string - php

I need a certain part of a URL extracted.
Example:
http://www.domain.com/blog/entry-title/?standalone=1 is the given URL.
blog/entry-title should be extracted.
However, the extraction should also work with http://www.domain.com/index.php/blog/[…]as the given URL.
This code is for a Content Management System.
What I've already come up with is this:
function getPathUrl() {
$folder = explode('/', $_SERVER['SCRIPT_NAME']);
$script_filename = pathinfo($_SERVER['SCRIPT_NAME']); // supposed to be 'index.php'
$request = explode('/', $_SERVER['REQUEST_URI']);
// first element is always ""
array_shift($folder);
array_shift($request);
// now it's only the request url. filtered out containing folders and 'index.php'.
$final_request = array_diff($request, array_intersect($folder, $request));
// the indexes are mangled up in a strange way. turn 'em back
$final_request = array_values($final_request);
// remove empty elements in array (caused by superfluent slashes, for instance)
array_clean($final_request);
// make a string out of the array
$final_request = implode('/', $final_request);
if ($_SERVER['QUERY_STRING'] || substr($final_request, -1) == '?') {
$final_request = substr($final_request, 0, - strlen($_SERVER['QUERY_STRING']) - 1);
}
return $final_request;
}
However, this code does not take care of the arguments at the end of the URL (like ?standalone=1). It works for anchors (#read-more), though.
Thanks a ton guys and have fun twisting your brains. Maybe we can do this shit with a regular expression.

There's many examples and info for what you want at:
http://php.net/manual/en/function.parse-url.php

That should do what you need:
<?php
function getPath($url)
{
$path = parse_url($url,PHP_URL_PATH);
$lastSlash = strrpos($path,"/");
return substr($path,1,$lastSlash-1);
}
echo getPath("http://www.domain.com/blog/entry-title/?standalone=1");
?>

Related

Remove subdomain from URL/host to match domains in affiliate link array

I want to make a redirect file using php which can add Affiliates tag automatically to all links. Like how it works https://freekaamaal.com/links?url=https://www.amazon.in/ .
If I open the above link it automatically add affiliate tag to the link and the final link which is open is this ‘https://www.amazon.in/?tag=freekaamaal-21‘ And same for Flipkart and many other sites also.
It automatically add affiliate tags to various links. For example amazon, Flipkart, ajio,etc.
I’ll be very thankful if anyone can help me regarding this.
Thanks in advance 🙏
Right now i made this below code but problem is that sometimes link have extra subdomain for example https://dl.flipkart.com/ or https://m.shopclues.com/ , etc for these type links it does not redirect from the array instead of this it redirect to default link.
<?php
$subid = isset($_GET['subid']) ? $_GET['subid'] : 'telegram'; //subid for external tracking
$affid = $_GET['url']; //main link
$parse = parse_url($affid);
$host = $parse['host'];
$host = str_ireplace('www.', '', $host);
//flipkart affiliate link generates here
$url_parts = parse_url($affid);
$url_parts['host'] = 'dl.flipkart.com';
$url_parts['path'] .= "/";
if(strpos($url_parts['path'],"/dl/") !== 0) $url_parts['path'] = '/dl'.rtrim($url_parts['path'],"/");
$url = $url_parts['scheme'] . "://" . $url_parts['host'] . $url_parts['path'] . (empty($url_parts['query']) ? '' : '?' . $url_parts['query']);
$afftag = "harshk&affExtParam1=$subid"; //our affiliate ID
if (strpos($url, '?') !== false) {
if (substr($url, -1) == "&") {
$url = $url.'affid='.$afftag;
} else {
$url = $url.'&affid='.$afftag;
}
} else { // start a new query string
$url = $url.'?affid='.$afftag;
}
$flipkartlink = $url;
//amazon link generates here
$amazon = $affid;
$amzntag = "subhdeals-21"; //our affiliate ID
if (strpos($amazon, '?') !== false) {
if (substr($amazon, -1) == "&") {
$amazon = $amazon.'tag='.$amzntag;
} else {
$amazon = $amazon.'&tag='.$amzntag;
}
} else { // start a new query string
$amazon = $amazon.'?tag='.$amzntag;
}
}
$amazonlink = $amazon;
$cueurl = "https://linksredirect.com/?subid=$subid&source=linkkit&url="; //cuelinks deeplink for redirection
$ulpsub = '&subid=' .$subid; //subid
$encoded = urlencode($affid); //url encode
$home = $cueurl . $encoded; // default link for redirection.
$partner = array( //Insert links here
"amazon.in" => "$amazonlink",
"flipkart.com" => "$flipkartlink",
"shopclues.com" => $cueurl . $encoded,
"aliexpress.com" => $cueurl . $encoded,
"ajio.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"croma.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
"myntra.com" => "https://ad.admitad.com/g/?ulp=$encoded$ulpsub",
);
$store = array_key_exists($host, $partner) === false ? $home : $partner[$host]; //Checks if the host exists if not then redirect to your default link
header("Location: $store"); //Do not changing
exit(); //Do not changing
?>
Thank you for updating your answer with the code you have and explaining what the actual problem is. Since your reference array for the affiliate links is indexed by base domain, we will need to normalize the hostname to remove any possible subdomains. Right now you have:
$host = str_ireplace('www.', '', $host);
Which will do the job only if the subdomain is www., obviously. Now, one might be tempted to simply explode by . and take the last two components. However that'd fail with your .co.id and other second-level domains. We're better off using a regular expression.
One could craft a universal regular expression that handles all possible second-level domains (co., net., org.; edu.,...) but that'd become a long list. For your use case, since your list currently only has the .com, .in and .co.in domain extensions, and is unlikely to have many more, we'll just hard-code these into the regex to keep things fast and simple:
$host = preg_replace('#^.*?([^.]+\.)(com|id|co\.id)$#i', '\1\2', $host);
To explain the regex we're using:
^ start-of-subject anchor;
.*? ungreedy optional match for any characters (if a subdomain -- or a sub-sub-domain exists);
([^.]+\.) capturing group for non-. characters followed by . (main domain name)
(com|id|co\.id) capturing group for domain extension (add to list as necessary)
$ end-of-subject anchor
Then we replace the hostname with the contents of the capture groups that matched domain. and its extension. This will return example.com for www.example.com, foo.bar.example.com -- or example.com; and example.co.id for www.example.co.id, foo.bar.example.co.id -- or example.co.id. This should help your script work as intended. If there are further problems, please update the OP and we'll see what solutions are available.

Thinking about domain validation

this is my first question. And btw I am unconfy with RegExes.
I was thinking about a PHP function that validates domains or URLs, given by user input. (Sub)Domains shall be collected via html input field.
So I have to deal with different formats like http(s)://domain.tld and domain.tld both with the possibility of including a path or being invalid.
The function should rather correct almost correct user input instead of returning false.
In the end, I want to return the format (sub.)domain.tld, but only for real existing domains.
My WIP-solution is the following. What do you think about it?
function valDomain($url,$prefix=""){
$url = trim($url);
$url = str_replace(" ", "", $url);
$url = trim($url,'.');
$url = trim($url,'?');
$url = trim($url,'-');
$url = trim($url,'/');
$url = strtolower($url);
$url = substr($url,0,100);
if(strpos($url,'.') == false) {
return false;
}
if(strpos($url,'http') !== false) {
$x = parse_url($url);
if(isset($x['host'])){
$url = $x['host'];
}
}
if(strpos($url,'/') !== false) {
$x = explode("/", $url);
if(isset($x[0])){
$url = $x[0];
}
}
if(checkdnsrr($url,"A")){
return $prefix.$url;
} else {
return false;
}
}
For explanation: It tidies up the user input, checks if it can be a url/domain at all, takes the host if it's a proper url, deletes the path, and then, when it only should be the raw url, check if there is a dns entry corresponding to it. Only if yes, it returns the validated domain. Other it returns false.
Does this make sense?
(The $prefix argument can optionally be used to add a http:// to the url in order to render a hyperlink).
Retrieved results will be stored in database, so they need to be hack-safe.

PHP Array to string conversion with preg_match

I have this error while I'm using this my script:
$pages = array('/about.php', '/');
//...............function text here................//
$ua = $_SERVER['HTTP_USER_AGENT'];
$mobiles = '/iphone|ipad|android|symbian|BlackBerry|HTC|iPod|IEMobile|Opera Mini|Opera Mobi|WinPhone7|Nokia|samsung|LG/i';
if (preg_match($mobiles, $ua)) {
$thispage = $_SERVER["HTTP_HOST"].$_SERVER["REQUEST_URI"];
if ($thispage == $_SERVER["HTTP_HOST"].$pages) {
ob_start("text");
}
}
This script changes certain pages style depending on user's useragent. I need this script in such way. But I don't know how to make it in PHP properly. Maybe I need some "foreach ($pages as $i)"? But it didn't work in a way I made it.
You are trying to check if the "requested resource" $_SERVER["REQUEST_URI"] is in predefined list of resource paths.
Change your condition as shown below(using in_array function):
...
if (in_array($_SERVER["REQUEST_URI"], $pages)) {
ob_start("text");
}

phpacademy login/register clean url

I'm currently working to make my own CRM website application and I followed Alex youtube tutorial which is the login/register using OOP.
In addition I need my index.php to be the dynamic content switcher, which I only include header and footer while the content load from a folder where it stores all the page. I believe the end result should be like www.example.com/index.php?page=profile
I look around and it seems like what I'm doing it's something similar to MVC pattern where index is the root file and all the content is loaded from view folder.
I managed to get everything done correctly but now instead of displaying the link like: www.example.com/user.php?name=jennifer
I wanted it to be www.example.com/user/name/jennifer
I try to look around phpacademy forum but the forum seems to be abandon, some search I managed to find a topic that relevant to what I want, but the code doesn't seems to be working and I got the same error with poster.
here is the code:
<?php
// Define the root of the site (this page should be in the root)
define('ROOT', rtrim(__DIR__, '/') . '/');
define('PAGES', ROOT . 'pages/');
// Define "safe" files that can be loaded
$safeFiles = ["login", "regiser", "profile", "changepassword"];
// Get URL
if(isset($_GET['page']) && !empty($_GET['page'])) {
$url = $_GET['page'];
} else {
$url = '/';
}
// Remove Path Traversal
$sanatize = array(
// Basic
'..', "..\\", '../', "\\",
// Percent encoding
'%2e%2e%2f', '%2e%2e/', '..%2f', '%2e%2e%5c', '%2e%2e', '..%5c', '%252e%252e%255c', '..%255c',
// UTF-8 encoding
'%c1%1c', '%c0%af', '..%c1%9c'
);
$url = str_replace($sanatize, '', $url);
// Prevent Null byte (%00)
// PHP 5.6 + should take care of this automatically, but PHP 5.0 < ....
$url = str_replace(chr(0), '', $url);
// Filter URL
$url = filter_var($url, FILTER_SANITIZE_URL);
// Remove any extra slashes
$url = rtrim($url, '/');
// Make lowercase url
$url = strtolower($url);
// Check current page
$path = PAGES . $url . '.php';
// If the file is in our safe array & exists, load it!
if(in_array($url, $safeFiles) && file_exists($path)) {
include($path);
} else {
echo "404: Page not found!";
}
I search around Google but I couldn't find a solution and I notice there were people asking in this forum as well hence I hope someone can assist me in this area.

How do you strip out the domain name from a URL in php?

Im looking for a method (or function) to strip out the domain.ext part of any URL thats fed into the function. The domain extension can be anything (.com, .co.uk, .nl, .whatever), and the URL thats fed into it can be anything from http://www.domain.com to www.domain.com/path/script.php?=whatever
Whats the best way to go about doing this?
parse_url turns a URL into an associative array:
php > $foo = "http://www.example.com/foo/bar?hat=bowler&accessory=cane";
php > $blah = parse_url($foo);
php > print_r($blah);
Array
(
[scheme] => http
[host] => www.example.com
[path] => /foo/bar
[query] => hat=bowler&accessory=cane
)
You can also write a regular expression to get exactly what you want.
Here is my attempt at it:
$pattern = '/\w+\..{2,3}(?:\..{2,3})?(?:$|(?=\/))/i';
$url = 'http://www.example.com/foo/bar?hat=bowler&accessory=cane';
if (preg_match($pattern, $url, $matches) === 1) {
echo $matches[0];
}
The output is:
example.com
This pattern also takes into consideration domains such as 'example.com.au'.
Note: I have not consulted the relevant RFC.
You can use parse_url() to do this:
$url = 'http://www.example.com';
$domain = parse_url($url, PHP_URL_HOST);
$domain = str_replace('www.','',$domain);
In this example, $domain should contain example.com, irrespective of it having www or not. It also works for a domain such as .co.uk
Following code will trim protocol, domain and port from absolute URL:
$urlWithoutDomain = preg_replace('#^.+://[^/]+#', '', $url);
Here are a couple simple functions to get the root domain (example.com) from a normal or long domain (test.sub.domain.com) or url (http://www.example.com).
/**
* Get root domain from full domain
* #param string $domain
*/
public function getRootDomain($domain)
{
$domain = explode('.', $domain);
$tld = array_pop($domain);
$name = array_pop($domain);
$domain = "$name.$tld";
return $domain;
}
/**
* Get domain name from url
* #param string $url
*/
public function getDomainFromUrl($url)
{
$domain = parse_url($url, PHP_URL_HOST);
$domain = $this->getRootDomain($domain);
return $domain;
}
Solved this...
Say we're calling dev.mysite.com and we want to extract 'mysite.com'
$requestedServerName = $_SERVER['SERVER_NAME']; // = dev.mysite.com
$thisSite = explode('.', $requestedServerName); // site name now an array
array_shift($thisSite); //chop off the first array entry eg 'dev'
$thisSite = join('.', $thisSite); //join it back together with dots ;)
echo $thisSite; //outputs 'mysite.com'
Works with mysite.co.uk too so should work everywhere :)
I spent some time thinking about whether it makes sense to use a regular expression for this, but in the end I think not.
firstresponder's regexp came close to convincing me it was the best way, but it didn't work on anything missing a trailing slash (so http://example.com, for instance). I fixed that with the following: '/\w+\..{2,3}(?:\..{2,3})?(?=[\/\W])/i', but then I realized that matches twice for urls like 'http://example.com/index.htm'. Oops. That wouldn't be so bad (just use the first one), but it also matches twice on something like this: 'http://abc.ed.fg.hij.kl.mn/', and the first match isn't the right one. :(
A co-worker suggested just getting the host (via parse_url()), and then just taking the last two or three array bits (split() on '.') The two or three would be based on a list of domains, like 'co.uk', etc. Making up that list becomes the hard part.
There is only one correct way to extract domain parts, it's use Public Suffix List (database of TLDs). I recomend TLDExtract package, here is sample code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('www.domain.com/path/script.php?=whatever');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'domain'
$result->getSuffix(); // will return (string) 'com'
This function should work:
function Delete_Domain_From_Url($Url = false)
{
if($Url)
{
$Url_Parts = parse_url($Url);
$Url = isset($Url_Parts['path']) ? $Url_Parts['path'] : '';
$Url .= isset($Url_Parts['query']) ? "?".$Url_Parts['query'] : '';
}
return $Url;
}
To use it:
$Url = "https://stackoverflow.com/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php";
echo Delete_Domain_From_Url($Url);
# Output:
#/questions/176284/how-do-you-strip-out-the-domain-name-from-a-url-in-php

Categories