Get keyword from a (search engine) referrer url using PHP - php

I am trying to get the search keyword from a referrer url. Currently, I am using the following code for Google urls. But sometimes it is not working...
$query_get = "(q|p)";
$referrer = "http://www.google.com/search?hl=en&q=learn+php+2&client=firefox";
preg_match('/[?&]'.$query_get.'=(.*?)[&]/',$referrer,$search_keyword);
Is there another/clean/working way to do this?
Thank you,
Prasad

If you're using PHP5 take a look at http://php.net/parse_url and http://php.net/parse_str
Example:
// The referrer
$referrer = 'http://www.google.com/search?hl=en&q=learn+php+2&client=firefox';
// Parse the URL into an array
$parsed = parse_url( $referrer, PHP_URL_QUERY );
// Parse the query string into an array
parse_str( $parsed, $query );
// Output the result
echo $query['q'];

There are different query strings on different search engines. After trying Wiliam's method, I have figured out my own method. (Because, Yahoo's is using 'p', but sometimes 'q')
$referrer = "http://search.yahoo.com/search?p=www.stack+overflow%2Ccom&ei=utf-8&fr=slv8-msgr&xargs=0&pstart=1&b=61&xa=nSFc5KjbV2gQCZejYJqWdQ--,1259335755";
$referrer_query = parse_url($referrer);
$referrer_query = $referrer_query['query'];
$q = "[q|p]"; //Yahoo uses both query strings, I am using switch() for each search engine
preg_match('/'.$q.'=(.*?)&/',$referrer,$keyword);
$keyword = urldecode($keyword[1]);
echo $keyword; //Outputs "www.stack overflow,com"
Thank you,
Prasad

To supplement the other answers, note that the query string parameter that contains the search terms varies by search provider. This snippet of PHP shows the correct parameter to use:
$search_engines = array(
'q' => 'alltheweb|aol|ask|ask|bing|google',
'p' => 'yahoo',
'wd' => 'baidu',
'text' => 'yandex'
);
Source: http://betterwp.net/wordpress-tips/get-search-keywords-from-referrer/

<?php
class GET_HOST_KEYWORD
{
public function get_host_and_keyword($_url) {
$p = $q = "";
$chunk_url = parse_url($_url);
$_data["host"] = ($chunk_url['host'])?$chunk_url['host']:'';
parse_str($chunk_url['query']);
$_data["keyword"] = ($p)?$p:(($q)?$q:'');
return $_data;
}
}
// Sample Example
$obj = new GET_HOST_KEYWORD();
print_r($obj->get_host_and_keyword('http://www.google.co.in/search?sourceid=chrome&ie=UTF-&q=hire php php programmer'));
// sample output
//Array
//(
// [host] => www.google.co.in
// [keyword] => hire php php programmer
//)
// $search_engines = array(
// 'q' => 'alltheweb|aol|ask|ask|bing|google',
// 'p' => 'yahoo',
// 'wd' => 'baidu',
// 'text' => 'yandex'
//);
?>

$query = parse_url($request, PHP_URL_QUERY);

This one should work For Google, Bing and sometimes, Yahoo Search:
if( isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER']) {
$query = getSeQuery($_SERVER['HTTP_REFERER']);
echo $query;
} else {
echo "I think they spelled REFERER wrong? Anyways, your browser says you don't have one.";
}
function getSeQuery($url = false) {
$segments = parse_url($url);
$keywords = null;
if($query = isset($segments['query']) ? $segments['query'] : (isset($segments['fragment']) ? $segments['fragment'] : null)) {
parse_str($query, $segments);
$keywords = isset($segments['q']) ? $segments['q'] : (isset($segments['p']) ? $segments['p'] : null);
}
return $keywords;
}

I believe google and yahoo had updated their algorithm to exclude search keywords and other params in the url which cannot be received using http_referrer method.
Please let me know if above recommendations will still provide the search keywords.
What I am receiving now are below when using http referrer at my website end.
from google: https://www.google.co.in/
from yahoo: https://in.yahoo.com/
Ref: https://webmasters.googleblog.com/2012/03/upcoming-changes-in-googles-http.html

Related

How can I efficiently add a GET parameter with either a ? or & in PHP?

I have to add a GET variable to a url. But the URL might already have GET variables. What's the most efficient way to add this new variable?
Example URLs:
http://domain.com/
http://domain.com/index.html?name=jones
I need to add: tag=xyz:
http://domain.com/?tag=xyz
http://domain.com/index.html?name=jones&tag=xyz
What's the most efficient way to know whether to prepend my string with a ? or &?
Here's a version of the function I have so far:
// where arrAdditions looks like array('key1'=>'value1','key2'=>'value2');
function appendUrlQueryString($url, $arrAdditions) {
$arrQueryStrings = array();
foreach ($arrAdditions as $k=>$v) {
$arrQueryStrings[] = $k . '=' . $v;
}
$strAppend = implode('&',$arrQueryStrings);
if (strpos($url, '?') !== false) {
$url .= '&' . $strAppend;
} else {
$url = '?' . $strAppend;
}
return $url;
}
But, is simply looking for the ? in the existing url problematic? For example, is it possible that a url includes a ? but not actual queries, perhaps as an escaped character?
Take a look at PHP PECL's http_build_url. Said by the doc page:
Build a URL.
The parts of the second URL will be merged into the first according to the flags argument.
Addition:
If you don't have PECL installed, we can jump through some hoops. This approach is somewhat solid right up until we try to rebuild the new URL. Stock PHP (minus PECL) doesn't have a reverse of parse_url(). Making it harder, parse_url() removes some of the grammar from a URL in the resulting parts array so we have to put them back in when we reassemble. http_build_url() can take care of this for us, but if it were available, you wouldn't be reading this portion as it's what I originally recommended. Anyway, here's that code:
<?php
/**
* addQueryParam - given a URL and some new params for its query string, return the modified URL
*
* #see http://us1.php.net/parse_url
* #see http://us1.php.net/parse_str
* #throws Exception on bad input
* #param STRING $url A parseable URL to add query params to
* #param MIXED $input_query_vars - STRING of & separated pairs of = separated key values OR ASSOCIATIVE ARRAY of STRING keys => STRING values
* #return STRING new URL
*/
function addQueryParam ($url, $input_query_vars) {
// Parse new parameters
if (is_string($input_query_vars)) {
parse_str($input_query_vars, $input_query_vars);
}
// Ensure array of parameters now available
if (!is_array($input_query_vars)) {
throw new Exception(__FUNCTION__ . ' expects associative array or query string as second parameter.');
}
// Break up given URL
$url_parts = parse_url($url);
// Test for proper URL parse
if (!is_array($url_parts)) {
throw new Exception(__FUNCTION__ . ' expects parseable URL as first parameter');
}
// Produce array of original query vars
$original_query_vars = array();
if (isset($url_parts['query']) && $url_parts['query'] !== '') {
parse_str($url_parts['query'], $original_query_vars);
}
// Merge new params inot original
$new_query_vars = array_merge($original_query_vars, $input_query_vars);
// replace the original query string
$url_parts['query'] = http_build_query($new_query_vars);
// Put URL grammar back in place
if (!empty($url_parts['scheme'])) {
$url_parts['scheme'] .= '://';
}
if (!empty($url_parts['query'])) {
$url_parts['query'] = '?' . $url_parts['query'];
}
if (!empty($url_parts['fragment'])) {
$url_parts['fragment'] = '#' . $url_parts['fragment'];
}
// Put it all back together and return it
return implode('', $url_parts);
}
// Your demo URLs
$url1 = 'http://domain.com/';
$url2 = 'http://domain.com/index.html?name=jones';
//Some usage (I did this from CLI)
echo $url1, "\n";
echo addQueryParam($url1, 'tag=xyz'), "\n";
echo addQueryParam($url1, array('tag' => 'xyz')), "\n";
echo $url2, "\n";
echo addQueryParam($url2, 'tag=xyz'), "\n";
echo addQueryParam($url2, array('tag' => 'xyz')), "\n";
echo addQueryParam($url2, array('name' => 'foo', 'tag' => 'xyz')), "\n";
To check if parameter already exists you could try parse_str().
This will parse your URL and put variables into an array.
It will give you some issues if you will pass full URL:
$url = "http://domain.com/index.html?name=jones&tag=xyz";
parse_str($url', $arr);
print_r($arr);
will give you
Array ( [http://domain_com/index_html?name] => jones [tag] => xyz )
but with
$url = "name=jones&tag=xyz";
you will get
Array ( [name] => jones [tag] => xyz )
You could explode full URL by '?' and check the second part. After the check you could know how to modify your URL. But I'm not sure this would work all the time.
$url_one = "http://www.stackoverflow.com?action=submit&id=example";
$new_params = "user=john&pass=123";
$final_url = $url_one."&".$new_param
s;
Now $final_url has old url and new params added to it.

Extract top domain from string php

I need to extract the domain name out of a string which could be anything. Such as:
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
or
$sitelink="http://subdomain.somewebsite.com/blah/blah/whatever.php";
In any case, I'm looking to extract the 'somewebsite.com' portion (which could be anything), and discard the rest.
With parse_url($url)
<?php
$url = 'http://username:password#hostname/path?arg=value#anchor';
print_r(parse_url($url));
?>
The above example will output:
Array
(
[scheme] => http
[host] => hostname
[user] => username
[pass] => password
[path] => /path
[query] => arg=value
[fragment] => anchor
)
Using thos values
echo parse_url($url, PHP_URL_HOST); //hostname
or
$url_info = parse_url($url);
echo $url_info['host'];//hostname
here it is
<?php
$sitelink="http://www.somewebsite.com/product/3749875/info/overview.html";
$domain_pieces = explode(".", parse_url($sitelink, PHP_URL_HOST));
$l = sizeof($domain_pieces);
$secondleveldomain = $domain_pieces[$l-2] . "." . $domain_pieces[$l-1];
echo $secondleveldomain;
note that this is not probably the behavior you are looking for, because, for hosts like
stackoverflow.co.uk
it will echo "co.uk"
see:
http://publicsuffix.org/learn/
http://www.dkim-reputation.org/regdom-libs/
http://www.dkim-reputation.org/regdom-lib-downloads/ <-- downloads here, php included
2 complexe url
$url="https://www.example.co.uk/page/section/younameit";
or
$url="https://example.co.uk/page/section/younameit";
To get "www.example.co.uk":
$host=parse_url($url, PHP_URL_HOST);
To get "example.co.uk" only
$parts = explode('www.',$host);
$domain = $parts[1];
// ...or...
$domain = ltrim($host, 'www.')
If your url includes "www." or not you get the same end result, i.e. "example.co.uk"
VoilĂ !
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.) and multilevel subdomains. Regex, parse_url() or string functions will never produce absolutely correct result.
I recomend use TLD Extract. Here example of code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('http://www.somewebsite.com/product/3749875/info/overview.html');
$result->getSubdomain(); // will return (string) 'www'
$result->getHostname(); // will return (string) 'somewebsite'
$result->getSuffix(); // will return (string) 'com'
$result->getRegistrableDomain(); // will return (string) 'somewebsite.com'
For a string that could be anything, new approach:
function extract_plain_domain($text) {
$text=trim($text,"/");
$text=strtolower($text);
$parts=explode("/",$text);
if (substr_count($parts[0],"http")) {
$parts[0]="";
}
reset ($parts);while (list ($key, $val) = each ($parts)) {
if (!empty($val)) { $text=$val; break; }
}
$parts=explode(".",$text);
if (empty($parts[2])) {
return $parts[0].".".$parts[1];
} else {
$num_parts=count($parts);
return $parts[$num_parts-2].".".$parts[$num_parts-1];
}
} // end function extract_plain_domain
You can use the Utopia Domains library (https://github.com/utopia-php/domains), it will return the domain TLD and public suffix based on Mozilla public suffix list (https://publicsuffix.org), it can be used as an alternative to the currently archived TLDExtract package.
You can use 'parse_url' function to get the hostname from your URL and than use Utopia Domains parser to get the correct TLD and join it together with the domain name:
<?php
require_once './vendor/autoload.php';
use Utopia\Domains\Domain;
$url = 'http://demo.example.co.uk/site';
$domain = new Domain(parse_url($url, PHP_URL_HOST)); // demo.example.co.uk
var_dump($domain->get()); // demo.example.co.uk
var_dump($domain->getTLD()); // uk
var_dump($domain->getSuffix()); // co.uk
var_dump($domain->getName()); // example
var_dump($domain->getSub()); // demo
var_dump($domain->isKnown()); // true
var_dump($domain->isICANN()); // true
var_dump($domain->isPrivate()); // false
var_dump($domain->isTest()); // false
var_dump($domain->getName().'.'.$domain->getSuffix()); // example.co.uk

Remove "http://" from URL string

I am using a bit.ly shortener for my custom domain. It outputs http://shrt.dmn/abc123; however, I'd like it to just output shrt.dmn/abc123.
Here is my code.
//automatically create bit.ly url for wordpress widgets
function bitly()
{
//login information
$url = get_permalink(); //for wordpress permalink
$login = 'UserName'; //your bit.ly login
$apikey = 'API_KEY'; //add your bit.ly APIkey
$format = 'json'; //choose between json or xml
$version = '2.0.1';
//generate the URL
$bitly = 'http://api.bit.ly/shorten?version='.$version.'&longUrl='.urlencode($url).'&login='.$login.'&apiKey='.$apikey.'&format='.$format;
//fetch url
$response = file_get_contents($bitly);
//for json formating
if(strtolower($format) == 'json')
{
$json = #json_decode($response,true);
echo $json['results'][$url]['shortUrl'];
}
else //for xml formatting
{
$xml = simplexml_load_string($response);
echo 'http://bit.ly/'.$xml->results->nodeKeyVal->hash;
}
}
As long as it is supposed to be url and if there is http:// - then this solution is the simplest possible:
$url = str_replace('http://', '', $url);
Change your following line:
echo $json['results'][$url]['shortUrl'];
for this one:
echo substr( $json['results'][$url]['shortUrl'], 7);
You want to do a preg_replace.
$variable = preg_replace( '/http:\/\//', '', $variable ); (this is untested, so you might also need to escape the : character ).
you can also achieve the same effect with $variable = str_replace('http://', '', $variable )

Detection - the title of the URL and the URL

How to detect, if there is any URL in the text and title it has (if any)?
If there is one, then it should change the URL:
from: http://stackoverflow.com
into:
<detected:url="http://stackoverflow.com"/>
I need also to retrieve titles from external links like this example:
<title:http://stackoverflow.com/="the actual title from the stackoverflow"/>
This is for single URL case:
$url = "http://www.stackoverflow.com/";
$check_result = get_detected_and_title( $url );
function get_detected_and_title( $url )
{
$detected = '<detected:url="'.$url.'"/>';
$title = '';
$tmp_html = file_get_contents( $url );
preg_match('/<title>(.*)<\/title>/', $tmp_html, $res);
$title = '<title:'.$url.'="'.$res[1].'"/>';
return array( $detected, $title );
}
Actually, after looking through SO's pages, I think this is more close to what you looking for. Although it needs some adjustment: How to mimic StackOverflow Auto-Link Behavior

Parsing GET request parameters in a URL that contains another URL

Here is the url:
http://localhost/test.php?id=http://google.com/?var=234&key=234
And I can't get the full $_GET['id'] or $_REQUEST['d'].
<?php
print_r($_REQUEST['id']);
//And this is the output http://google.com/?var=234
//the **&key=234** ain't show
?>
$get_url = "http://google.com/?var=234&key=234";
$my_url = "http://localhost/test.php?id=" . urlencode($get_url);
$my_url outputs:
http://localhost/test.php?id=http%3A%2F%2Fgoogle.com%2F%3Fvar%3D234%26key%3D234
So now you can get this value using $_GET['id'] or $_REQUEST['id'] (decoded).
echo urldecode($_GET["id"]);
Output
http://google.com/?var=234&key=234
To get every GET parameter:
foreach ($_GET as $key=>$value) {
echo "$key = " . urldecode($value) . "<br />\n";
}
$key is GET key and $value is GET value for $key.
Or you can use alternative solution to get array of GET params
$get_parameters = array();
if (isset($_SERVER['QUERY_STRING'])) {
$pairs = explode('&', $_SERVER['QUERY_STRING']);
foreach($pairs as $pair) {
$part = explode('=', $pair);
$get_parameters[$part[0]] = sizeof($part)>1 ? urldecode($part[1]) : "";
}
}
$get_parameters is same as url decoded $_GET.
While creating url encode them with urlencode
$val=urlencode('http://google.com/?var=234&key=234')
Click here
and while fetching decode it wiht urldecode
You may have to use urlencode on the string 'http://google.com/?var=234&key=234'
I had a similar problem and ended up using parse_url and parse_str, which as long as the URL in the parameter is correctly url encoded (which it definitely should) allows you to access both all the parameters of the actual URL, as well as the parameters of the encoded URL in the query parameter, like so:
$get_url = "http://google.com/?var=234&key=234";
$my_url = "http://localhost/test.php?id=" . urlencode($get_url);
function so_5645412_url_params($url) {
$url_comps = parse_url($url);
$query = $url_comps['query'];
$args = array();
parse_str($query, $args);
return $args;
}
$my_url_args = so_5645412_url_params($my_url); // Array ( [id] => http://google.com/?var=234&key=234 )
$get_url_args = so_5645412_url_params($my_url_args['id']); // Array ( [var] => 234, [key] => 234 )
you use bad character like ? and & and etc ...
edit it to new code
see this links
http://antoine.goutenoir.com/blog/2010/10/11/php-slugify-a-string/
http://sourcecookbook.com/en/recipes/8/function-to-slugify-strings-in-php
also you can use urlencode
$val=urlencode('http://google.com/?var=234&key=234')
The correct php way is to use parse_url()
http://php.net/manual/en/function.parse-url.php
(from php manual)
This function parses a URL and returns an associative array containing any of the various components of the URL that are present.
This function is not meant to validate the given URL, it only breaks it up into the above listed parts. Partial URLs are also accepted, parse_url() tries its best to parse them correctly.
if (isset($_SERVER['HTTPS'])){
echo "https://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]$_SERVER[QUERY_STRING]";
}else{
echo "http://$_SERVER[HTTP_HOST]$_SERVER[REQUEST_URI]$_SERVER[QUERY_STRING]";
}

Categories