Reduce link (URL) size - php

Is it possible to reduce the size of a link (in text form) by PHP or JS?
E.g. I might have links like these:
http://www.example.com/index.html <- Redirects to the root
http://www.example.com/folder1/page.html?start=true <- Redirects to page.html
http://www.example.com/folder1/page.html?start=false <- Redirects to page.html?start=false
The purpose is to find out, if the link can be shortened and still point to the same location. In these examples the first two links can be reduces, because the first points to the root, and the second has parameters that can be omitted.
The third link is then the case, where the parameters can't be omitted, meaning that it can't be reduced further than to remove the http://.
So the above links would be reduced like this:
Before: http://www.example.com/index.html
After: www.example.com
Before: http://www.example.com/folder1/page.html?start=true
After: www.example.com/folder1/page.html
Before: http://www.example.com/folder1/page.html?start=false
After: www.example.com/folder1/page.html?start=false
Is this possible by PHP or JS?
Note:
www.example.com is not a domain I own or have access to besides through the URL. The links are potentially unknown, and I'm looking for something like an automatic link shortener that can work by getting the URL and nothing else.
Actually I was thinking of something like a linkchecker that could check if the link works before and after the automatic trim, and if it doesn't then the check will be done again at a less trimmed version of the link. But that seemed like overkill...

Since you want to do this automatically, and you don't know how the parameters change the behaviour, you will have to do this by trial and error: Try to remove parts from an URL, and see if the server responds with a different page.
In the simplest case this could work somehow like this:
<?php
$originalUrl = "http://stackoverflow.com/questions/14135342/reduce-link-url-size";
$originalContent = file_get_contents($originalUrl);
$trimmedUrl = $originalUrl;
while($trimmedUrl) {
$trialUrl = dirname($trimmedUrl);
$trialContent = file_get_contents($trialUrl);
if ($trialContent == $originalContent) {
$trimmedUrl = $trialUrl;
} else {
break;
}
}
echo "Shortest equivalent URL: " . $trimmedUrl;
// output: Shortest equivalent URL: http://stackoverflow.com/questions/14135342
?>
For your usage scenario, your code would be a bit more complicated, as you would have to test for each parameter in turn to see if it is necessary. For a starting point, see the parse_url() and parse_str() functions.
A word of caution: this code is very slow, as it will perform lots of queries to every URL you want to shorten. Also, it will likely fail to shorten many URLs because the server might include stuff like timestamps in the response. This makes the problem very hard, and that's the reason why companies like google have many engineers that think about stuff like this :).

Yea, that's possible:
JS:
var url = 'http://www.example.com/folder1/page.html?start=true';
url = url.replace('http://','').replace('?start=true','').replace('/index.html','');
php:
$url = 'http://www.example.com/folder1/page.html?start=true';
$url = str_replace(array('http://', '?start=true', '/index.html'), "", $url);
(Each item in the array() will be replaced with "")

Here is a JS for you.
function trimURL(url, trimToRoot, trimParam){
var myRegexp = /(http:\/\/|https:\/\/)(.*)/g;
var match = myRegexp.exec(url);
url = match[2];
//alert(url); // www.google.com
if(trimParam===true){
url = url.split('?')[0];
}
if(trimToRoot === true){
url = url.split('/')[0];
}
return url
}
alert(trimURL('https://www.google.com/one/two.php?f=1'));
alert(trimURL('https://www.google.com/one/two.php?f=1', true));
alert(trimURL('https://www.google.com/one/two.php?f=1', false, true));
Fiddle: http://jsfiddle.net/5aRpQ/

Related

WordPress: append query string to all URL's

A user will be directed from a website to a landing page that will have a query string in the URL i.e. www.sitename.com?foo=bar&bar=foo. What I want to do, is then append that query string to all links on the page, preferably whether they were generated by WordPress or not (i.e. hard coded or not) and done server-side.
The reason for this is because their goal destination has to have the query string in the URL. I could use cookies, but i'd rather not since it has many other problems that it will bring with it for my specific use case.
I have explored the possibility of using .htaccess in conjunction with $_SERVER['QUERY_STRING'] to no avail. My understanding of .htaccess isn't great, but in my mind I assumed it would be possible to rewrite the current URL to be current URL + the variable that stores $_SERVER['QUERY_STRING'].
I've also explored add_rewrite_rule but couldn't find a logical way to achieve what I want.
Here's the Javascript solution I have, but as I said, I'd like a server-side solution:
const links = document.querySelectorAll('a');
links.forEach(link => {
if (!link.host.includes(location.host)) {
return;
}
const url = new URL(link.href);
const combined = Array.from(url.searchParams.entries()).reduce((agg, [key, val]) => {
agg.set(key, val);
return agg;
}, (new URL(location.href)).searchParams);
const nextUrl = [link.protocol, '//', link.host, link.pathname].join('');
link.href = (new URL(`${nextUrl}?${combined.toString()}`)).toString();
});

Redirecting takes place even to wrong urls

I have written a script, to redirect the users who visit my website,
http://localhost/ghi/red.php?go=http://www.google.com
When theres URL like above my script grabs the go variable value and checks whether its there on my database table as a trusted site if so it redirects to the site. In this occurance the redirection should take place even for sub domains
as an example even if the "go" variable has a value like www.google.com/images the redirection should take place if www.google.com is there in the trusted sites table.
I do that by using PHP INDEX OF function as below
$pos = strrpos($trusted_sites, $go_value);
this works fine, But there is a problem that i accidentally came across...
Which is even if the go variable has a value like www.google.comqwsdad it still redirects the user to www.google.com
this is a serious bug any help would be highly appreciated on how to avoid redirecting to wrong urls
If you want such redirect from a whitelist of sites. First build of the whilelist in an array. Then you can compare them using in_array() from the $_GET['go']. Consider this example:
// sample: http://localhost/ghi/red.php?go=http://www.google.com/images
if(isset($_GET['go'])) {
$go = $_GET['go'];
$url = parse_url($go);
$go = $url['host'];
$scheme = $url['scheme'];
$certified_sites = array('www.imdb.com', 'www.tomshardware.com', 'www.stackoverflow.com', 'www.tizag.com', 'www.google.com');
if(in_array($go, $certified_sites)) {
header("Location: $scheme://$go");
exit;
} else {
// i will not redirect
}
}
The "correct" way is to us an array of sites (or even a database), then use in_array.
<?php
$trusted_sites=array("http://www.google.com","http://www.yahoo.com");
if (in_array("http://www.google.com",$trusted_sites)) {
print "Ok\n";
} else {
print "Bad site\n";
}
A quick way of cheating, which I use from time to time, is to make sure you have a separator (e.g. a space) as the first and last character of your $trusted_sites, then add the separator to the beginning and end of your $go_value.
<?php
$trusted_sites="http://www.google.com http://www.yahoo.com";
$go="http://www.google.com";
if (strpos(" $trusted_sites "," $go ")===False) {
print "Bad site\n";
} else {
print "Ok\n";
}
In this example, I've added the separator (a space) to the beginning and end of both variables, inside the strpos(); in the case of $trusted_sites, I could have put them in the initial declaration instead.

Issue with & in a string submitted with $_GET

I'm building an "away"-page for my website and when a user posted a link to another website, each visitor clicking that link will be redirected first to the away.php file with an info that I am not responsible for the content of the linked website.
The code in away.php to fetch the incoming browser URI is:
$goto = $_GET['to'];
So far it works, however there's a logical issue with dynamic URIs, in example:
www.mydomain.com/away.php?to=http://example.com
is working, but dynamic URIs like
www.mydomain.com/away.php?to=http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0
aren't working since there is a & included in the linked domain, which will cause ending the $_GET['to'] string to early.
The $goto variable contains only the part until the first &:
echo $_GET['to'];
===> "http://www.youtube.com/watch?feature=fvwp"
I understand why, but looking for a solution since I haven't found it yet on the internet.
Try using urlencode:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
echo $link;
The function will convert url special symbols into appropriate symbols that can carry data.
It will look like this and may be appended to a get parameter:
http%3A%2F%2Fwww.youtube.com%2Fwatch%3Ffeature%3Dfvwp%26v%3Dj1p0_R8ZLB0
To get special characters back (for example to output the link) there is a function urldecode.
Also function htmlentities may be useful.
You can test with this:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
$redirect = "{$_SERVER['PHP_SELF']}?to={$link}" ;
if (!isset($_GET['to'])){
header("Location: $redirect") ;
} else {
echo $_GET['to'];
}
EDIT:
Ok, I have got a solution for your particular situation.
This solution will work only if:
Parameter to will be last in the query string.
if (preg_match("/to=(.+)/", $redirect, $parts)){ //We got a parameter TO
echo $parts[1]; //Get everything after TO
}
So, $parts[1] will be your link.

Differentiate pages with same url

I have a script that runs on two different pages, one for orders and one for quotes. These pages have an identical url followed by a dynamic string. What can I do to have this script do one thing on one page and one thing on another?
Edit: I wasn't very clear on this looking back, the current selected answer does work well for what I asked, however it shouldn't be used with Magento. Magento has built in methods for determining this information, and you would want to override it rather than inject script into the adminhtml code.
Look at the parameters from the URL via $_REQUEST in PHP. See here: http://php.net/manual/en/reserved.variables.request.php
EDIT:
I see from your comments that your URL is like http://www.example.com/index.php/admin/sales_order/view/order_id/273151/.
If it's always this way without any query parameters, then you may want to parse the $_SERVER['PATH_INFO'] variable in PHP.
(see here: http://php.net/manual/en/reserved.variables.server.php).
You can get an array of these path parts by doing:
$myPathArray = explode($_SERVER['PATH_INFO'],'/');
Then you can get that last, differentiating, part of the path like this:
if (count($myPathArray)) {
$orderId = $myPathArray[count($myPathArray)-1];
} else {
$orderId = ''; // or whatever you please
}
You can check what's in the URL, for example with this :
function getUrlParameter = function(name, defaultValue) {
name = name.replace(/[\[]/,"\\\[").replace(/[\]]/,"\\\]");
var regexS = "[\\?&]"+name+"=([^&#]*)";
var regex = new RegExp( regexS );
var results = regex.exec( document.location.href );
if( results == null ) return defaultValue;
else return results[1];
};
If your URL is test.php?a=toto then you'll have toto in pageA :
var pageA = getUrlParameter("toto");
EDIT : if you just want the end of the path part, look at document.location.pathname
If your url variable names are different on each page you could use an if statement
if (isset($_GET['vara'])) {
// Do thing A
}
elseif (isset($_GET['varb'])) {
// Do thing b
}
Use $_GET to retrieve variables passed in the URL. If your new at this Tizag has some easy to read tutorials. With the values of variables being passed, you can figure out which page you are coming from.

Multiple $_GET through links

I'm doing a website. There's a pagination, you click on links and they take you to the page you need, the links pass $_GET variable ( a href="?pn=2" ) and that works fine.
However when i add the category links (also contain $_GET variable
(a href="?sort=english") on the same page, which kind of sort the content on the page, and click it, the system simply overrides the url and deletes all the previous $_GET's.
For example, I'm on page 2 (http://website.com/index.php?pn=2)
and then I click this sorting link and what I'm expecting to get is this (http://website.com/index.php?pn=2&sort=english), but what I get is this:
(http://website.com/index.php?sort=english). It simply overrides the previous $_GET, instead of adding to it!
A relative URI consisting of just a query string will replace the entire existing query string. There is no way to write a URL that will add to an existing query. You have to write the complete query string that you want.
You can maintain the existing string by adding it explicitly:
href="?foo=<?php echo htmlspecialchars($_GET['foo']); ?>&bar=123"
Try using this:
$_SERVER['REQUEST_URI'];
On this link you can see examples. And on this link I have uploaded test document where you can try it yourself, it just prints out this line from above.
EDIT: Although this can help you get the current parameters in URL, I think it's not solution for you. Like Quentin said, you will have to write full link manually and maintain each parameter.
You could create a function that will iterate through your $_GET array and create a query string. Then all you would have to do is change your $_GET array and generate this query string.
Pseudocode (slash I don't really know PHP but here's a good example you should be able to follow):
function create_query_string($array) {
$kvps = array();
for ($key in $array) {
array_push($kvps, "$key=$array[$key]");
}
return "?" . implode("&", $kvps);
}
Usage:
$_GET["sort"] = "english";
$query_string = create_query_string($_GET);
You need to maintain the query parameters when you create the new links. The links on the page should be something like this:
Sort by English
The HTTP protocol is stateless -- it doesn't remember the past. You have to remind it of what the previous HTTP parameters were via PHP or other methods (cookies, etc). In your case, you need to remind it what the current page number is, as in the example above.

Categories