my code is
function getTitle($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
return $title[1];
}
else
{
return false;
}
}
function getMetas($Url){
$str = file_get_contents($Url);
if(strlen($str)>0){
// preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
preg_match("/<meta name=\"description\" content=\"(.*?)\"/",$str,$title);
// preg_match( '<meta name="description".*content="([^"]+)">siU', $str, $title);
return $title[1];
}
else
{
return false;
}
}
//Example:
$url=$_POST['url'];
echo getTitle($url);
echo "<br><br>";
echo getMetas($url);
this does not shows result for all the url's , example http://google.com
Why are you using regular expression for parsing the <meta> tags ?
PHP has an in-built function for parsing the meta information , it is called the get_meta_tags()
Illustration :
<?php
$tags = get_meta_tags('http://www.stackoverflow.com/');
echo "<pre>";
print_r($tags);
OUTPUT:
Array
(
[twitter:card] => summary
[twitter:domain] => stackoverflow.com
[og:type] => website
[og:image] => http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon#2.png?v=fde65a5a78c6
[og:title] => Stack Overflow
[og:description] => Q&A for professional and enthusiast programmers
[og:url] => http://stackoverflow.com/
)
As you can see the title , image and description are being parsed which you really want.
I know the question is 1.5 years old. But if you are still looking for it, you can use https://urlmeta.org. Its a free API to extract URL meta.
You can check a URL for http or https by
$url='stackoverflow.com';
$http_check='http://';
$https_check='https://';
if(substr($url,0,7)!=$http_check){
$url=$http_check.$url;
}else if(substr($url,0,8)!=$https_check){
$url=$https_check.$url;
}else{
$url=$url
}
then you can use the above answer
<?php
$tags = get_meta_tags($url);
echo "<pre>";
print_r($tags);
Related
Please help the statement i am using for matching pinterest username url is
$url = http://pinterest.com/username
preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url);
but preg_match result are returning 0
You are missing the third parameter of the preg_match function.
$url = "http://pinterest.com/username";
preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url, $match);
print_r($match);
results in
Array
(
[0] => http://pinterest.com/username
[1] =>
[2] => username
)
Or in an if statement:
$url = "http://pinterest.com/username";
if (preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url, $match)) {
// true
}
<?php
$url = "http://pinterest.com/username";
if(preg_match("|^http(s)?://pinterest.com/(.*)?$|i", $url)){
echo "true";
}
else{
echo "false";
}
?>
output:
true
What else you want ?
No one said that need to escape point.
So more correct code will be something like this:
$url = "https://pinterest.com/username";
preg_match("|(?:https?://)(?:www\.)?pinterest\.com/(.+)/?|i", $url, $match);
It will return username. I don't know the rules that have pinterest for usernames so I just match all that are inside of slashes.
It will work with links like:
https://pinterest.com/username/
https://www.pinterest.com/username
pinterest.com/username
and other
Don't use this regular expression for validation
I have a small piece of code that checks a string for a url and adds the < a href> tag to create a link. I also have it check the string for a youtube link and then add rel="youtube" to the < a> tag.
How can I get the code to only add rel to the youtube links?
How can I get it to add a different rel to any type of image link?
$text = "http://site.com a site www.anothersite.com/ http://www.youtube.com/watch?v=UyxqmghxS6M here is another site";
$linkstring = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '\4', $text );
if(preg_match('/http:\/\/www\.youtube\.com\/watch\?v=[^&]+/', $linkstring, $vresult)) {
$linkstring = preg_replace( '/(http|ftp)+(s)?:(\/\/)((\w|\.)+)(\/)?(\S+)?/i', '<a rel="youtube" href="\0">\4</a>', $text );
$type= 'youtube';
}
else {
$type = 'none';
}
echo $text;
echo $linkstring, "<br />";
echo $type, "<br />";
Try http://simplehtmldom.sourceforge.net/.
Code:
<?php
include('simple_html_dom.php');
$html = str_get_html('Link');
$html->find('a', 0)->rel = 'youtube';
echo $html;
Output:
[username#localhost dom]$ php dom.php
Link
You can build an entire page DOM or a simple single link with this library.
Detecting hostname of URL:
Pass the url to parse_url. parse_url returns an array of the URL parts.
Code:
print_r(parse_url('http://www.youtube.com/watch?v=UyxqmghxS6M'));
Output:
Array
(
[scheme] => http
[host] => www.youtube.com
[path] => /watch
[query] => v=UyxqmghxS6M
)
Try the following:
//text
$text = "http://site.com/bounty.png a site www.anothersite.com/ http://www.youtube.com/watch?v=UyxqmghxS6M&featured=true here is another site";
//Youtube links
$pattern = "/(http:\/\/){0,1}(www\.){0,1}youtube\.com\/watch\?v=([a-z0-9\-_\|]{11})[^\s]*/i";
$replacement = '<a rel="youtube" href="http://www.youtube.com/watch?v=\3">\0</a>';
$text = preg_replace($pattern, $replacement, $text);
//image links
$pattern = "/(http:\/\/){0,1}(www\.){0,1}[^\/]+\/[^\s]+\.(png|jpg|jpeg|bmp|gif)[^\s]*/i";
$replacement = '<a rel="image" href="\0">\0</a>';
$text = preg_replace($pattern, $replacement, $text);
note that the latter can only detect links to images which have an extension. As such, links like www.example.com?image=3 will not be detected.
I'm using some regexes to parse wiki-styled text.
<?php
function wikiParser($data){
$data = preg_replace('/\[\[Youtube:([a-zA-Z0-9_]+)\]\]/', getYoutubeTitle("$1"), $data);
return $data;
}
?>
This function searches for strings like [[Youtube:b32hRITAAew]] and calles another function getYoutubeTitle(b32hRITAAew).
<?php
function getYoutubeTitle($hash){
$url = 'http://gdata.youtube.com/feeds/api/videos?v=2&q='.$hash.'&max-results=1&fields=entry(title)&prettyprint=true';
$fp = fopen($url, 'r');
$page = '';
while(!feof($fp)){
$page .= fgets($fp, 4096);
}
$titre = eregi("<title>(.*)</title>", $page, $regs);
return $regs[1];
}
?>
The second function parses the response data. In the case of b32hRITAAew code, the following url is accessed
http://gdata.youtube.com/feeds/api/videos?v=2&q=b32hRITAAew&max-results=1&fields=entry(title)&prettyprint=true
It outputs:
<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
<entry>
<title>The Lord of the Rings Symphony (1) HQ</title>
</entry>
</feed>
And the title should be The Lord of the Rings Symphony (1) HQ. But for the unknown reason it shows me some random Photography Trick - Easy Image Stabilizer For Any Camera. I've worked hard to solve the issue, but still can't get it how that comes up.
Is there any problem with getYoutubeTitle("$1") or anything else?
You can use the preg_replace_callback function like this:
function wikiParser($data){
$data = preg_replace_callback('/\[\[Youtube:([a-zA-Z0-9_]+)\]\]/', "getYoutubeTitle", $data);
return $data;
}
function getYoutubeTitle($array){
// $array looks like this: Array ( [0] => [[Youtube:b32hRITAAew]] [1] => b32hRITAAew )
$hash = array_pop($array);
$url = 'http://gdata.youtube.com/feeds/api/videos?v=2&q='.$hash.'&max-results=1&fields=entry(title)&prettyprint=true';
...
}
I need a regex that will give me the string inside an href tag and inside the quotes also.
For example i need to extract theurltoget.com in the following:
URL
Additionally, I only want the base url part. I.e. from http://www.mydomain.com/page.html i only want http://www.mydomain.com/
Dont use regex for this. You can use xpath and built in php functions to get what you want:
$xml = simplexml_load_string($myHtml);
$list = $xml->xpath("//#href");
$preparedUrls = array();
foreach($list as $item) {
$item = parse_url($item);
$preparedUrls[] = $item['scheme'] . '://' . $item['host'] . '/';
}
print_r($preparedUrls);
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
echo $info['scheme'].'://'.$info['host']; // http://www.mydomain.com
this expression will handle 3 options:
no quotes
double quotes
single quotes
'/href=["\']?([^"\'>]+)["\']?/'
Use the answer by #Alec if you're only looking for the base url part (the 2nd part of the question by #David)!
$html = 'URL';
$url = preg_match('/<a href="(.+)">/', $html, $match);
$info = parse_url($match[1]);
This will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html" class="myclass" rel="myrel
)
So you can use $href = $info["scheme"] . "://" . $info["host"]
Which gives you:
// http://www.mydomain.com
When you are looking for the entire url between the href, You should be using another regex, for instance the regex provided by #user2520237.
$html = 'URL';
$url = preg_match('/href=["\']?([^"\'>]+)["\']?/', $html, $match);
$info = parse_url($match[1]);
this will give you:
$info
Array
(
[scheme] => http
[host] => www.mydomain.com
[path] => /page.html
)
Now you can use $href = $info["scheme"] . "://" . $info["host"] . $info["path"];
Which gives you:
// http://www.mydomain.com/page.html
http://www.the-art-of-web.com/php/parse-links/
Let's start with the simplest case - a well formatted link with no extra attributes:
/<a href=\"([^\"]*)\">(.*)<\/a>/iU
For all href values replacement:
function replaceHref($html, $replaceStr)
{
$match = array();
$url = preg_match_all('/<a [^>]*href="(.+)"/', $html, $match);
if(count($match))
{
for($j=0; $j<count($match); $j++)
{
$html = str_replace($match[1][$j], $replaceStr.urlencode($match[1][$j]), $html);
}
}
return $html;
}
$replaceStr = "http://affilate.domain.com?cam=1&url=";
$replaceHtml = replaceHref($html, $replaceStr);
echo $replaceHtml;
This will handle the case where there are no quotes around the URL.
/<a [^>]*href="?([^">]+)"?>/
But seriously, do not parse HTML with regex. Use DOM or a proper parsing library.
/href="(https?://[^/]*)/
I think you should be able to handle the rest.
Because Positive and Negative Lookbehind are cool
/(?<=href=\").+(?=\")/
It will match only what you want, without quotation marks
Array (
[0] => theurltoget.com )
I am trying to get the search keyword from a referrer url. Currently, I am using the following code for Google urls. But sometimes it is not working...
$query_get = "(q|p)";
$referrer = "http://www.google.com/search?hl=en&q=learn+php+2&client=firefox";
preg_match('/[?&]'.$query_get.'=(.*?)[&]/',$referrer,$search_keyword);
Is there another/clean/working way to do this?
Thank you,
Prasad
If you're using PHP5 take a look at http://php.net/parse_url and http://php.net/parse_str
Example:
// The referrer
$referrer = 'http://www.google.com/search?hl=en&q=learn+php+2&client=firefox';
// Parse the URL into an array
$parsed = parse_url( $referrer, PHP_URL_QUERY );
// Parse the query string into an array
parse_str( $parsed, $query );
// Output the result
echo $query['q'];
There are different query strings on different search engines. After trying Wiliam's method, I have figured out my own method. (Because, Yahoo's is using 'p', but sometimes 'q')
$referrer = "http://search.yahoo.com/search?p=www.stack+overflow%2Ccom&ei=utf-8&fr=slv8-msgr&xargs=0&pstart=1&b=61&xa=nSFc5KjbV2gQCZejYJqWdQ--,1259335755";
$referrer_query = parse_url($referrer);
$referrer_query = $referrer_query['query'];
$q = "[q|p]"; //Yahoo uses both query strings, I am using switch() for each search engine
preg_match('/'.$q.'=(.*?)&/',$referrer,$keyword);
$keyword = urldecode($keyword[1]);
echo $keyword; //Outputs "www.stack overflow,com"
Thank you,
Prasad
To supplement the other answers, note that the query string parameter that contains the search terms varies by search provider. This snippet of PHP shows the correct parameter to use:
$search_engines = array(
'q' => 'alltheweb|aol|ask|ask|bing|google',
'p' => 'yahoo',
'wd' => 'baidu',
'text' => 'yandex'
);
Source: http://betterwp.net/wordpress-tips/get-search-keywords-from-referrer/
<?php
class GET_HOST_KEYWORD
{
public function get_host_and_keyword($_url) {
$p = $q = "";
$chunk_url = parse_url($_url);
$_data["host"] = ($chunk_url['host'])?$chunk_url['host']:'';
parse_str($chunk_url['query']);
$_data["keyword"] = ($p)?$p:(($q)?$q:'');
return $_data;
}
}
// Sample Example
$obj = new GET_HOST_KEYWORD();
print_r($obj->get_host_and_keyword('http://www.google.co.in/search?sourceid=chrome&ie=UTF-&q=hire php php programmer'));
// sample output
//Array
//(
// [host] => www.google.co.in
// [keyword] => hire php php programmer
//)
// $search_engines = array(
// 'q' => 'alltheweb|aol|ask|ask|bing|google',
// 'p' => 'yahoo',
// 'wd' => 'baidu',
// 'text' => 'yandex'
//);
?>
$query = parse_url($request, PHP_URL_QUERY);
This one should work For Google, Bing and sometimes, Yahoo Search:
if( isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER']) {
$query = getSeQuery($_SERVER['HTTP_REFERER']);
echo $query;
} else {
echo "I think they spelled REFERER wrong? Anyways, your browser says you don't have one.";
}
function getSeQuery($url = false) {
$segments = parse_url($url);
$keywords = null;
if($query = isset($segments['query']) ? $segments['query'] : (isset($segments['fragment']) ? $segments['fragment'] : null)) {
parse_str($query, $segments);
$keywords = isset($segments['q']) ? $segments['q'] : (isset($segments['p']) ? $segments['p'] : null);
}
return $keywords;
}
I believe google and yahoo had updated their algorithm to exclude search keywords and other params in the url which cannot be received using http_referrer method.
Please let me know if above recommendations will still provide the search keywords.
What I am receiving now are below when using http referrer at my website end.
from google: https://www.google.co.in/
from yahoo: https://in.yahoo.com/
Ref: https://webmasters.googleblog.com/2012/03/upcoming-changes-in-googles-http.html