Clear strange parameters from file_get_contents

Clear strange parameters from file_get_contents - php

My client contacts with the database via service. When I do some action, I call the service URL with file_get_contents then process the response.
I think when a user enters the site from google search result, I think google adds some parameters to my service URL which belongs to file_get_contents.
For example,
as it should be,
file_get_contents(service.domain/service_name?param1=0)
but google adds some strange parameters at the end of url. like this:
file_get_contents(service.domain/service_name?param1=0&force=1)
I saw two or three strange parameters so,
force = 1
gclid=CNC-jvTapbcCFaaj
I don't know how can I handle and remove these parameters.
How can I remove these strange parameters before calling the URL with file_get_contents?
This is my function:
public function getWhere($keyword)
{
$locale = session('locale');
if($locale != "en" && $locale != "")
{
$locale = "_".session('locale');
}else{
$locale = "";
}
$page = file_get_contents(getenv('API_URL')."station_info?search.url=".$keyword);
$page = json_decode($page);
$page = (array) $page->rows;
$station = $page[0];
if($station->search->status == 1)
{
return redirect('/');
}
return view('station', compact('station', 'locale'));
}
Strange parameters add themselves end of the URL.
Edit
I think someone is trying to something. $keyword must take a string which is station name, example, London.
But someone is trying to send London&force=1 so I have to check the $keyword variable.

You can parse url by parse_url function, remove unwanted parameters or just keep parameters you want and build url back
after using parse_url function you may need to parse query by parse_str function, so code will look like:
$urlParsed = parse_url("service.domain/service_name?param1=0&force=1");
$params = parse_str ($urlParsed['query']);
$newParams = [];
$neededParams = ['param1', 'param2'];
foreach ($neededParams as $p) {
if (isset($params[$p])) {
$newParams[$p] = $params[$p];
}
}
$newUrl = $urlParsed['scheme'] . '://' . $urlParsed['host'] . $urlParsed['path'] . '?' . http_build_query($newParams);
file_get_contents($newUrl);
I didn't test that code, but I hope the idea is clear and you can amend it to your needing.

Related

How to deal with unencoded URL redirects to my website correctly?

We are using CleverReach to redirect people to our website after they have double opt-in their mail account. We redirect the email as a query parameter to our website, like: example.com/thanks?email=foo#bar.com (by setting up a redirect in the CleverReach backend like example.com/thanks?email={EMAIL}). Apparently, the email parameter doesnt get urlencoded by cleverreach.
Now, in Drupal, if the URL is like so: example.com/thanks?email=hello+world#bar.com and using this code:
$request = \Drupal::request();
$email = $request->query->get('email');
$email is hello world#bar.com. Now, I dont know what the correct processing is here. Obviously, I cant tell CleverReach to urlencode their redirects beforehand. I dont even know if that would be best practice or if I need to imlement something...
The only thing I found out is that $_SERVER['QUERY_STRING'] contains the "real" string, which I can urlencode and then redirect, and then, by reading the query params, urldecode them. But I feel like I am missing some crucial inbuilt functionality.
TL;DR
If a website redirects to my website using not urlencoded query params, how do I read them?
My current approach:
<?php
public function redirectIfIllegalUri() {
$request = \Drupal::request();
$email = $request->query->get('email', '');
$needsRedirect = (false !== strpos($email, ' ') || false !== strpos($email, '#'));
if ($needsRedirect && isset($_SERVER['QUERY_STRING']) && false !== strpos($_SERVER['QUERY_STRING'], 'email=')) {
$sqs = $_SERVER['QUERY_STRING'];
$sqs = htmlspecialchars($sqs);
$sqs = filter_var($sqs, FILTER_SANITIZE_STRING);
$sqs = filter_var($sqs, FILTER_SANITIZE_ENCODED);
$sqs = urldecode($sqs);
$sqs = explode('&', $sqs);
foreach ($sqs as $queryParam) {
if (false === strpos($queryParam, 'email=')) continue;
$values = explode('=', $queryParam);
$email = $values[1];
}
$emailEncoded = urlencode($email);
$query = $request->query->all();
$query['email'] = $emailEncoded;
$refreshUrl = Url::fromRoute('<current>');
$refreshUrl->setOptions([
'query' => $query,
]);
$response = new RedirectResponse($refreshUrl->toString(), 301);
$response->send();
return;
}
}
$request = \Drupal::request();
$email = urldecode($request->query->get('email', false));
drupal request() docs

The problem you are facing is that the + will be treated as a space when you get the value from $_GET global variable.
Currently in PHP doesn't exist a method that returns these values without urldecoding and you need to build a custom function to achieve what you are asking:
A simple function will return not encoded input is by using this function:
function get_params() {
$getData = $_SERVER['QUERY_STRING'];
$getParams = explode('&', $getData);
$getParameters = [];
foreach ($getParams as $getParam) {
$parsed = explode('=', $getParam);
$getParameters[$parsed[0]] = $parsed[1];
}
return $getParameters;
}
This solution can be used if you do not have any other option. By using this function you will always get the data encoded.
If you can encode the value from cleverreach then the best approach is to encode it there.
Encoding the value in cleverreach for email hello+world#bar.com will give you this url example.com/thanks?email=hello%2Bworld%40bar.com and in $_GET you will have the email containing the + sign.

PHP - URL gets malformed during redirect

So, I have an image link that has this href:
http://www.app.com/link?target=www.target.com&param1=abc&param2=xyz
This is processed like so (I use laravel):
function out (Request $request) {
$url = $request->target;
$qs = $request->except('target');
if ( !empty($qs) ) {
$url .= strpos($url, '?') !== false ? '&' : '?';
$url .= http_build_query($qs);
}
return redirect($url);
}
Most of the time, this works. However, lately, we've been experiencing an issue where param1 and param2 are attached to the URL in a seemingly infinite loop causing us to hit a 414 Request URI too long Error.
The problem is that it happens so randomly that I really don't know where to check because I added a checker before the return statement.
if ( substr_count($url, 'param1') > 1 ) {
$file = storage_path() . '/logs/logger.log';
$log = "[ " . date("d-m-Y H:i:sa") . " ] [ {$request->ip()} ] - {$url} \n";
file_put_contents($file, $log, FILE_APPEND);
}
And it hasn't logged a single hit. Even after our testers experienced the bug.
Is it possible that the receiving application is breaking the URL somehow?
What information should I be looking out for? Have you seen an issue like this before?
Is it the http_build_query that could be causing this and that my checker just doesn't work as expected (though, I did test it and it logged my test URL).
Any help on the matter would be great.

Assuming and issue with http_build_query:
Well, one attempt you may try is to rewrite the code without $request->except and http_build_query.
If you don't have any special reason to use http_build_query i would suggest to use $request->input.
Example with $request->input:
function out (Request $request) {
$url = $request->target;
$param1 = $request->input('param1', '');
$param2 = $request->input('param2', '');
if (!empty($param1) || !empty($param2)) {
$url .= '?';
}
if (!empty($param1) && !empty($param2)) {
$url .= 'param1=' . $param1 . '&param2=' . $param2;
} else {
$url .= !empty($param1) 'param1=' . $param1 : '';
$url .= !empty($param2) 'param2=' . $param2 : '';
}
return redirect($url);
}
The solution is a little bit more verbose but with that, you should be sure 100% that is not the code to generate the redundancy.
Absurd, remote possibility:
The second thing I would try is to check you log system. For instance if you are running under apache you should have a file called access.log under /var/log/apache2/ (or under /var/log/nginx/ with nginx).
In there you should have the history of all your http requests.
Maybe there is a chance that some of the wired requests with multiple params are from a strange IP address.
If this is the case, it means that some company is monitoring and testing the website (potentially with the strange parameters) for security reasons.
If this is the case, I guess you are under http and you should switch to https.
Anyway, with the new code, you should be sure about the code and be able to investigate any other part of the system.

How can I detect if a given URL is the current one?

I need to detect if a provided URL matches the one currently navigated to. Mind you the following are all valid, yet semantically equivalent URLs:
https://www.example.com/path/to/page/index.php?parameter=value
https://www.example.com/path/to/page/index.php
https://www.example.com/path/to/page/
https://www.example.com/path/to/page
http://www.example.com/path/to/page
//www.example.com/path/to/page
//www/path/to/page
../../../path/to/page
../../to/page
../page
./
The final function must return true if the given URL points back to the current page, or false if it does not. I do not have a list of expected URLs; this will be used for a client who just wants links to be disabled when they link to the current page. Note that I wish to ignore parameters, as these do not indicate the current page on this site. I got as far as using the following regex:
/^((https?:)?\/\/www(\.example\.com)\/path\/to\/page\/?(index.php)?(\?.+=.*(\&.+=.*)*)?)|(\.\/)$/i
where https?, www, \.example\.com, \/path\/to\/page, and index.php are dynamically detected with $_SERVER["PHP_SELF"] and made into regex form, but that doesn't match the relative URLs like ../../to/page.
EDIT: I got a bit farther with the regex: refiddle.com/gv8
now I'd just need PHP to dynamically create the regex for any given page.

First off, there is no way to predict the total list of valid URLs that will result in display of the current page, since you can't predict (or control) external links that might link back to the page. What if someone uses TinyURL or bit.ly? A regex will not cut the mustard.
If what you need is to insure that a link does not result in the same page, then you need to TEST it. Here's a basic concept:
Every page has a unique ID. Call it a serial number. It should be persistent. The serial number should be embedded somewhere predictable (though perhaps invisibly) within the page.
As the page is created, your PHP will need to walk through all the links for each page, visit each one, and determine whether the link resolves to a page with a serial number that matches the calling page's serial number.
If the serial number does not match, display the link as a link. Otherwise, display something else.
Obviously, this will be an arduous, resource-intensive process for page production. You really don't want to solve your problem this way.
With your "ultimate goal" comment in mind, I suspect your best approach is to be approximate. Here are some strategies...
First option is also the simplest. If you're building a content management system that USUALLY creates links in one format, just support that format. Wikipedia's approach works because a [[link]] is something THEY generate, so THEY know how it's formatted.
Second is more the direction you've gone with your question. The elements of a URL are "protocol", "host", "path" and "query string". You can break them out into a regex, and possibly get it right. You've already stated that you intend to ignore the query string. So ... start with '((https?:)?//(www\.)?example\.com)?' . $_SERVER['SCRIPT_NAME'] and add endings to suit. Other answers are already helping you with this.
Third option is quite a bit more complex, but gives you more fine-grained control over your test. As with the last option, you have the various URL elements. You can test for the validity of each without using a regex. For example:
$a = array(); // init array for valid URLs
// Step through each variation of our path...
foreach([$_SERVER['SCRIPT_NAME'], $_SERVER['REQUEST_URI']] as $path) {
// Step through each variation of our host...
foreach ([$_SERVER['HTTP_HOST'], explode(".", $_SERVER['HTTP_HOST'])[0]] as $server) {
// Step through each variation of our protocol...
foreach (['https://','http://','//'] as $protocol) {
// Set the URL as a key.
$a[ $protocol . $server . $path ] = 1;
}
}
// Also for each path, step through directories and parents...
$apath=explode('/', $path); // turn the path into an array
unset($apath[0]); // strip the leading slash
for( $i = 1; $i <= count($apath); $i++ ) {
if (strlen($apath[$i])) {
$a[ str_repeat("../", 1+count($apath)-$i) . implode("/", $apath) ] = 1;
// add relative paths
}
unset($apath[$i]);
}
$a[ "./" . implode("/", $apath) ] = 1; // add current directory
}
Then simply test whether the link (minus its query string) is an index within the array. Or adjust to suit; I'm sure you get the idea.
I like this third solution the best.

A regex isn't actually necessary to strip off all the query parameters. You could use strok():
$url = strtok($url, '?');
And, to check the output for your URL array:
$url_list = <<<URL
https://www.example.com/path/to/page/index.php?parameter=value
https://www.example.com/path/to/page/index.php
...
./?parameter=value
./
URL;
$urls = explode("\n", $url_list);
foreach ($urls as $url) {
$url = strtok($url, '?'); // remove everything after ?
echo $url."\n";
}
As a function (could be improved):
function checkURLMatch($url, $url_array) {
$url = strtok($url, '?'); // remove everything after ?
if( in_array($url, $url_array)) {
// url exists array
return True;
} else {
// url not in array
return False;
}
}
See it live!

You can use this approach:
function checkURL($me, $s) {
$dir = dirname($me) . '/';
// you may need to refine this
$s = preg_filter(array('~^//~', '~/$~', '~\?.*$~', '~\.\./~'),
array('', '', '', $dir), $s);
// parse resulting URL
$url = parse_url($s);
var_dump($url);
// match parsed URL's path with self
return ($url['path'] === $me);
}
// your page's URL with stripped out .php
$me = str_replace('.php', '', $_SERVER['PHP_SELF']);
// assume this is the URL you are matching against
$s = '../page/';
// compare $me with $s
$ret = checkURL($me, $s);
var_dump($ret);
Live Demo: http://ideone.com/OZZM53

As I have been paid to work on this for the last couple days, I wasn't just sitting around waiting for an answer. I've come up with one that works in my test platform; what does everyone else think? It feels a little bloated, but also feels bulletproof.
Debug echoes left in in case you wanna echo out some stuffs.
global $debug;$debug = false; // toggle debug echoes and var_dumps
/**
* Returns a boolean indicating whether the given URL is the current one.
*
* #param $otherURL the other URL, as a string. Can be any URL, relative or canonical. Invalid URLs will not match.
*
* #return true iff the given URL points to the same place as the current one
*/
function isCurrentURL($otherURL)
{global $debug;
if($debug)echo"<!--\r\nisCurrentURL($otherURL)\r\n{\r\n";
if ($thisURL == $otherURL) // unlikely, but possible. Might as well check.
return true;
// BEGIN Parse other URL
$otherProtocol = parse_url($otherURL);
$otherHost = $otherProtocol["host"] or null; // if $otherProtocol["host"] is set and is not null, use it. Else, use null.
$otherDomain = explode(".", $otherHost) or $otherDomain;
$otherSubdomain = array_shift($otherDomain); // subdom only
$otherDomain = implode(".", $otherDomain); // domain only
$otherFilepath = $otherProtocol["path"] or null;
$otherProtocol = $otherProtocol["scheme"] or null;
// END Parse other URL
// BEGIN Get current URL
#if($debug){echo '$_SERVER == '; var_dump($_SERVER);}
$thisProtocol = $_SERVER["HTTP_X_FORWARDED_PROTO"]; // http or https
$thisHost = $_SERVER["HTTP_HOST"]; // subdom or subdom.domain.tld
$thisDomain = explode(".", $thisHost);
$thisSubdomain = array_shift($thisDomain); // subdom only
$thisDomain = implode(".", $thisDomain); // domain only
if ($thisDomain == "")
$thisDomain = $otherDomain;
$thisFilepath = $_SERVER["PHP_SELF"]; // /path/to/file.php
$thisURL = "$thisProtocol://$thisHost$thisFilepath";
// END Get current URL
if($debug)echo"Current URL is $thisURL ($thisProtocol, $thisSubdomain, $thisDomain, $thisFilepath).\r\n";
if($debug)echo"Other URL is $otherURL ($otherProtocol, $otherHost, $otherFilepath).\r\n";
$thisDomainRegexed = isset($thisDomain) && $thisDomain != null && $thisDomain != "" ? "(\." . str_replace(".","\.",$thisDomain) . ")?" : ""; // prepare domain for insertion into regex
// v this makes the last slash before index.php optional
$regex = "/^(($thisProtocol:)?\/\/$thisSubdomain$thisDomainRegexed)?" . preg_replace('/index\\\..+$/i','?(index\..+)?', str_replace(array(".", "/"), array("\.", "\/"), $thisFilepath)) . '$/i';
if($debug)echo "\r\nregex is $regex\r\nComparing regex against $otherURL";
if (preg_match($regex, $otherURL))
{
if($debug)echo"\r\n\tIt's a match! Returning true...\r\n}\r\n-->";
return true;
}
else
{
if($debug)echo"\r\n\tOther URL is NOT a fully-qualified URL in this subdomain. Checking if it is relative...";
if($otherURL == $thisFilepath) // somewhat likely
{
if($debug)echo"\r\n\t\tOhter URL and this filepath are an exact match! Returning true...\r\n}\r\n-->";
return true;
}
else
{
if($debug)echo"\r\n\t\tFilepath is not an exact match. Testing against regex...";
$regex = regexFilepath($thisFilepath);
if($debug)echo"\r\n\t\tNew Regex is $regex";
if($debug)echo"\r\n\t\tComparing regex against $otherFilepath...";
if (preg_match($regex, $otherFilepath))
{
if($debug)echo"\r\n\t\t\tIt's a match! Returning true...\r\n}\r\n-->";
return true;
}
}
}
if($debug)echo"\r\nI tried my hardest, but couldn't match $otherURL to $thisURL. Returning false...\r\n}\r\n-->";
return false;
}
/**
* Uses the given filepath to create a regex that will match it in any of its relative representations.
*
* #param $path the filepath to be converted
*
* #return a regex that matches a all relative forms of the given filepath
*/
function regexFilepath($path)
{global $debug;
if($debug)echo"\r\nregexFilepath($path)\r\n{\r\n";
$filepathArray = explode("/", $path);
if (count($filepathArray) == 0)
throw new Exception("given parameter not a filepath: $path");
if ($filepathArray[0] == "") // this can happen if the path starts with a "/"
array_shift($filepathArray); // strip the first element off the array
$isIndex = preg_match("/^index\..+$/i", end($filepathArray));
$filename = array_pop($filepathArray);
if($debug){var_dump($filepathArray);}
$ret = '';
foreach($filepathArray as $i)
$ret = "(\.\.\/$ret$i\/)?"; // make a pseudo-recursive relative filepath
if($debug)echo "\r\n$ret";
$ret = preg_replace('/\)\?$/', '?)', $ret); // remove the last '?' and add one before the last '\/'
if($debug)echo "\r\n$ret";
$ret = '/^' . ($ret == '' ? '\.\/' : "((\.\/)|$ret)") . ($isIndex ? '(index\..+)?' : str_replace('.', '\.', $filename)) . '$/i'; // if this filepath leads to an index.php (etc.), then that filename is implied and irrelevant.
if($debug)echo'\r\n}\r\n';
}
This seems to match everything I need it to match, and not what I don't need it to.

Get youtube id for all url types

The following code works with all YouTube domains except for youtu.be. An example would be: http://www.youtube.com/watch?v=ZedLgAF9aEg would turn into: ZedLgAF9aEg
My question is how would I be able to make it work with http://youtu.be/ZedLgAF9aEg.
I'm not so great with regex so your help is much appreciated. My code is:
$text = preg_replace("#[&\?].+$#", "", preg_replace("#http://(?:www\.)?youtu\.?be(?:\.com)?/(embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=|)#i", "", $text)); }
$text = (htmlentities($text, ENT_QUOTES, 'UTF-8'));
Thanks again!

//$url = 'http://www.youtube.com/watch?v=ZedLgAF9aEg';
$url = 'http://youtu.be/ZedLgAF9aEg';
if (FALSE === strpos($url, 'youtu.be/')) {
parse_str(parse_url($url, PHP_URL_QUERY), $id);
$id = $id['v'];
} else {
$id = basename($url);
}
echo $id; // ZedLgAF9aEg
Will work for both versions of URLs. Do not use regex for this as PHP has built in functions for parsing URLs as I have demonstrated which are faster and more robust against breaking.

Your regex appears to solve the problem as it stands now? I didn't try it in php, but it appears to work fine in my editor.
The first part of the regex http://(?:www\.)?youtu\.?be(?:\.com)?/matches http://youtu.be/ and the second part (embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=|) ends with |) which means it matches nothing (making it optional). In other words it would trim away http://youtu.be/ leaving only the id.
A more intuitive way of writing it would be to make the whole if grouping optional I suppose, but as far as I can tell your regex is already solving your problem:
#http://(?:www\.)?youtu\.?be(?:\.com)?/(embed/|watch\?v=|\?v=|v/|e/|.+/|watch.*v=)?#i
Note: Your regex would work with the www.youtu.be.com domain as well. It would be stripped away, but something to watch out for if you use this for validating input.
Update:
If you want to only match urls inside [youtube][/youtube] tags you could use look arounds.
Something along the lines of:
(?<=\[youtube\])(?:http://(?:www\.)?youtu\.?be(?:\.com)?/(?:embed/|watch\?v=|\?v=|v/|e/|[^\[]+/|watch.*v=)?)(?=.+\[/youtube\])
You could further refine it by making the .+ in the look ahead only match valid URL characters etc.

Try this, hope it'll help you
function YouTubeUrl($url)
{
if($url!='')
{
$newUrl='';
$videoLink1=$url;
$findKeyWord='youtu.be';
$toBeReplaced='www.youtube.com';
if(IsContain('watch?v=',$videoLink1))
{
$newUrl=tMakeUrl($videoLink1);
}
else if(IsContain($videoLink1, $findKeyWord))
{
$videoLinkArray=explode('/',$videoLink1);
$Protocol='';
if(IsContain('://',$videoLink1))
{
$protocolArray=explode('://',$videoLink1);
$Protocol=$protocolArray[0];
}
$file=$videoLinkArray[count($videoLinkArray)-1];
$newUrl='www.youtube.com/watch?v='.$file;
if($Protocol!='')
$newUrl.=$Protocol.$newUrl;
else
$newUrl=tMakeUrl($newUrl);
}
else
$newUrl=tMakeUrl($videoLink1);
return $newUrl;
}
return '';
}
function IsContain($string,$findKeyWord)
{
if(strpos($string,$findKeyWord)!==false)
return true;
else
return false;
}
function tMakeUrl($url)
{
$tSeven=substr($url,0,7);
$tEight=substr($url,0,8);
if($tSeven!="http://" && $tEight!="https://")
{
$url="http://".$url;
}
return $url;
}

You can use bellow function for any of youtube URL
I hope this will help you
function checkYoutubeId($id)
{
$youtube = "http://www.youtube.com/oembed?url=". $id ."&format=json";
$curl = curl_init($youtube);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$return = curl_exec($curl);
curl_close($curl);
return json_decode($return, true);
}
This function return Youtube video detail if Id match to youtube video ID

A little improvement to #rvalvik answer would be to include the case of the mobile links (I've noticed it while working with a customer who used an iPad to navigate, copy and paste links). In this case, we have a m (mobile) letter instead of www. Regex then becomes:
#(https?://)?(?:www\.)?(?:m\.)?(?:youtu\.be/|youtube\.com(?:/embed/|/v/|/watch?.*?v=))([\w\-]{10,12}).*#x
Hope it helps.

A slight improvement of another answer:
if (strpos($url, 'feature=youtu.be') === TRUE || strpos($url, 'youtu.be') === FALSE )
{
parse_str(parse_url($url, PHP_URL_QUERY), $id);
$id = $id['v'];
}
else
{
$id = basename($url);
}
This takes into account youtu.be still being in the URL, but not the URL itself (it does happen!) as it could be the referring feature link.

Other answers miss out on the point that some youtube links are part of a playlist and have a list paramater also which is required for embed code. So to extract the embed code from link one could try this JS code:
let urlEmbed = "https://www.youtube.com/watch?v=iGGolqb6gDE&list=PL2q4fbVm1Ik6DCzm9XZJbNwyHtHGclcEh&index=32"
let embedId = urlEmbed.split('v=')[1];
let parameterStringList = embedId.split('&');
if (parameterStringList.length > 1) {
embedId = parameterStringList[0];
let listString = parameterStringList.filter((parameterString) =>
parameterString.includes('list')
);
if (listString.length > 0) {
listString = listString[0].split('=')[1];
embedId = `${parameterStringList[0]}?${listString}`;
}
}
console.log(embedId)
Try it out here: https://jsfiddle.net/AMITKESARI2000/o62dwj7q/

try this :
$string = explode("=","http://www.youtube.com/watch?v=ZedLgAF9aEg");
echo $string[1];
would turn into: ZedLgAF9aEg

url cutoff by link module or pagepeeker formatter issue in drupal 7

I have drupal 7 question that may involve some php help. I have created an rss feed from google alerts that I am mapping into fields. I have had success mapping into all the fields except the link module field where I have put a field formatter that creates a pagepeeker screenshot by attaching the appropriate url server query to the feeds url. Feeds is doing its job by taking the Item URL (link) and putting it into the field correctly. I am having an issue with with either pagepeeker or link module because below keeps happening.
To recap-
Google Alert feed -> Link module field -> pagepeeker screenshot formatter
here's the error
The url that google alerts provides is
http://www.google.com/url?sa=X&q=http://www.beautyjunkiesunite.com/WP/2012/05/30/whats-new-anastasia-beverly-hills-lash-genius/&ct=ga&cad=CAcQARgAIAEoATAAOABA3t-Y_gRIAlgBYgVlbi1VUw&cd=F7w9TwL-6ao&usg=AFQjCNG2rbJCENvRR2_k6pL9RntjP66Rvg
When the link is displayed I get :
http://pagepeeker.com/thumbs.php?size=m&url=www.google.com/url
Its cutting the url at url and not getting the rest of the url.
Here's the code that pagepeeker uses to parse the url ?
<?php
function _pagepeeker_format_url($url, $domain_only = FALSE) {
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
return FALSE;
}
// try to parse the url
$parsed_url = parse_url($url);
if (!empty($parsed_url)) {
$host = (!empty($parsed_url['host'])) ? $parsed_url['host'] : '';
$port = (!empty($parsed_url['port'])) ? ':' . $parsed_url['port'] : '';
$path = (!empty($parsed_url['path'])) ? $parsed_url['path'] : '';
$query = (!empty($parsed_url['query'])) ? '?' . $parsed_url['query'] : '';
$fragment = (!empty($parsed_url['fragment'])) ? '#' . $parsed_url['fragment'] : '';
if ($domain_only) {
return $host . $port;
}
else {
return $host . $port . $path . $query . $fragment;
}
}
return FALSE;
}
Could this be the problem?
Please let me know I can clarify in any way.
What I need is for the entire url to get processed and not just the truncated one
Thanks !

I have seen a very similar question here at SO or drupal SO page but couldn't find it so I'm writing "my way" answer again here.
<?php
function _pagepeeker_format_url($url, $domain_only = FALSE) {
if (filter_var($url, FILTER_VALIDATE_URL) === FALSE) {
return FALSE;
}
//$url = 'http://www.google.com/url?sa=X&q=http://www.beautyjunkiesunite.com/WP/2012/05/30/whats-new-anastasia-beverly-hills-lash-genius/&ct=ga&cad=CAcQARgAIAEoATAAOABA3t-Y_gRIAlgBYgVlbi1VUw&cd=F7w9TwL-6ao&usg=AFQjCNG2rbJCENvRR2_k6pL9RntjP66Rvg';
// Now we use parse_url to split the url to an array with url parts.
$parsed_url = parse_url($url);
// $parsed_url['query'] is 'sa=X&q=http://www.beautyjunkiesunite.com/WP/2012/05/30/whats-new-anastasia-beverly-hills-lash-genius/&ct=ga&cad=CAcQARgAIAEoATAAOABA3t-Y_gRIAlgBYgVlbi1VUw&cd=F7w9TwL-6ao&usg=AFQjCNG2rbJCENvRR2_k6pL9RntjP66Rvg'
// ";" can also be used to separate params. But & is the usual one so using it.
$queryParts = explode('&', $parsed_url['query']);
$params = array();
foreach ($queryParts as $param) {
$item = explode('=', $param);
// sa = X, etc.
$params[$item[0]] = $item[1];
}
//$params is now an array with query parts.
// $params['sa'] = 'X' , q = 'http://www.beautyjunkiesunite.com/WP/2012/05/30/whats-new-anastasia-beverly-hills-lash-genius', etc.
if ($domain_only){
$new_url_parsts = parse_url($params['q']);
return $new_url_parts['host'];
}
else{
return $params['q'];
}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Clear strange parameters from file_get_contents - php

Related

How to deal with unencoded URL redirects to my website correctly?

PHP - URL gets malformed during redirect

How can I detect if a given URL is the current one?

Get youtube id for all url types

url cutoff by link module or pagepeeker formatter issue in drupal 7

Categories

Resources