PHP - Get Website Title From User Site Input - php

I'm trying to get the title of a website that is entered by the user.
Text input: website link, entered by user is sent to the server via AJAX.
The user can input anything: an actual existing link, or just single word, or something weird like 'po392#*#8'
Here is a part of my PHP script:
// Make sure the url is on another host
if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
$url = "http://".$url;
}
// Extra confirmation for security
if (filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED)) {
$urlIsValid = "1";
} else {
$urlIsValid = "0";
}
// Make sure there is a dot in the url
if (strpos($url, '.') !== false) {
$urlIsValid = "1";
} else {
$urlIsValid = "0";
}
// Retrieve title if no title is entered
if($title == "" AND $urlIsValid == "1") {
function get_http_response_code($theURL) {
$headers = get_headers($theURL);
if($headers) {
return substr($headers[0], 9, 3);
} else {
return 'error';
}
}
if(get_http_response_code($url) != "200") {
$urlIsValid = "0";
} else {
$file = file_get_contents($url);
$res = preg_match("/<title>(.*)<\/title>/siU", $file, $title_matches);
if($res === 1) {
$title = preg_replace('/\s+/', ' ', $title_matches[1]);
$title = trim($title);
$title = addslashes($title);
}
// If title is still empty, make title the url
if($title == "") {
$title = $url;
}
}
}
However, there are still errors occuring in this script.
It works perfectly if an existing url as 'https://www.youtube.com/watch?v=eB1HfI-nIRg' is entered and when a non-existing page is entered as 'https://www.youtube.com/watch?v=NON-EXISTING', but it doesn't work when the users enters something like 'twitter.com' (without http) or something like 'yikes'.
I tried literally everthing: cUrl, DomDocument...
The problem is that when an invalid link is entered, the ajax call never completes (it keeps loading), while it should $urlIsValid = "0" whenever an error occurs.
I hope someone can help you - it's appreciated.
Nathan

You have a relatively simple problem but your solution is too complex and also buggy.
These are the problems that I've identified with your code:
// Make sure the url is on another host
if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
$url = "http://".$url;
}
You won't make sure that that possible url is on another host that way (it could be localhost). You should remove this code.
// Make sure there is a dot in the url
if (strpos($url, '.') !== false) {
$urlIsValid = "1";
} else {
$urlIsValid = "0";
}
This code overwrites the code above it, where you validate that the string is indeed a valid URL, so remove it.
The definition of the additional function get_http_response_code is pointless. You could use only file_get_contents to get the HTML of the remote page and check it against false to detect the error.
Also, from your code I conclude that, if the (external to context) variable $title is empty then you won't execute any external fetch so why not check it first?
To sum it up, your code should look something like this:
if('' === $title && filter_var($url, FILTER_VALIDATE_URL))
{
//# means we suppress warnings as we won't need them
//this could be done with error_reporting(0) or similar side-effect method
$html = getContentsFromUrl($url);
if(false !== $html && preg_match("/<title>(.*)<\/title>/siU", $file, $title_matches))
{
$title = preg_replace('/\s+/', ' ', $title_matches[1]);
$title = trim($title);
$title = addslashes($title);
}
// If title is still empty, make title the url
if($title == "") {
$title = $url;
}
}
function getContentsFromUrl($url)
{
//if not full/complete url
if(!preg_match('#^https?://#ims', $url))
{
$completeUrl = 'http://' . $url;
$result = #file_get_contents($completeUrl);
if(false !== $result)
{
return $result;
}
//we try with https://
$url = 'https://' . $url;
}
return #file_get_contents($url);
}

Related

check if given domain name present in set of urls php

I have an url whose format may be :
www.discover.com
http://discover.com
http://www.discover.com
http://www.abcd.discover.com
discover.com
And i have another url which may be any of below format:
www.discover.com/something/smoething
http://discover.com/something/smoething
http://www.discover.com/something/smoething
http://www.abcd.discover.com/something/smoething
discover.com/something/smoething
Now i want to compare this two urls to check whether domain name "discover.com" is present in the second url.
Am using below code :
$domain1 = str_ireplace('www.', '', parse_url($urlItem1, PHP_URL_HOST));
$domain2= str_ireplace('www.', '', parse_url($urlItem2, PHP_URL_HOST));
if(strstr($domain2, $domain1))
{
return $domain2;
}
Solution :
function url_comparison($url1, $url2) {
$domain1 = parse_url($url1,PHP_URL_HOST);
$domain2 = parse_url($url2,PHP_URL_HOST);
$domain1 = isset($domain1) ? str_ireplace('www.', '',$domain1) : str_ireplace('www.', '',$url1);
$domain2 = isset($domain2) ? str_ireplace('www.', '',$domain2) : str_ireplace('www.', '',$url2);
if(strstr($domain2, $domain1))
{
return true;
}
else
{
return false;
}
}
$url1 = "discover.com";
$url2 = "https://www.abcd.discover.com/credit-cards/resources/balance-transfer.shtml";
if(url_comparison($url1, $url2))
{
echo "Same Domain";
}
else
{
echo "Diffrent Domain";
}
Thanks.
Make use of the documentation, parse url
Then you should look at the hostname, and with use of strpos.
$url = parse_url('www.discover.com/something/smoething');
if (strpos($url['host'], 'discover.com') !== false) {
// do you thing
}
0 is also a valid value so the !== or === is needed
To check if two domain are equal you need to set some rules, because is www.example.com the same as example.com, and is https the same as http?
function url_comparison($url_1, $url_2, $www = false, $scheme = false) {
$url_part_1 = parse_url($url_1);
$url_part_2 = parse_url($url_2);
if ($scheme && $url_part_1['scheme'] !== $url_part_2['scheme']) {
return false;
}
if ($www && $url_part_1['host'] === $url_part_2['host']) {
return false;
} elseif(!$www && (strpos($url_part_1['host'], $url_part_2['host']) !== false || strpos($url_part_2['host'], $url_part_1['host']) !== false)) {
return false;
}
return true;
}
With the above function you should see the right direction, not tested so should be tweaked perhaps. The first 2 values should be an url. $www is a boolean if the 'www.' should be checked, and if $scheme = true also the https or http needs to be the same

PHP - Check if URL exists in link

Here is what I want to do..
Lets say I am looking for the link "example.com" in a file at http://example.com/test.html".
I want to take a PHP script that looks for an in the mentioned website. However, I also need it to work if there is a class or ID tag in the <A>.
See below url
How can I check if a URL exists via PHP?
or try it
$file = 'http://www.domain.com/somefile.jpg';
$file_headers = #get_headers($file);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
From here: http://www.php.net/manual/en/function.file-exists.php#75064
...and right below the above post, there's a curl solution:
function url_exists($url) {
if (!$fp = curl_init($url)) return false;
return true;
}
Update code:-
You can use SimpleHtmlDom Class for find id or class in tag
see the below URL
http://simplehtmldom.sourceforge.net/
http://simplehtmldom.sourceforge.net/manual_api.htm
http://sourceforge.net/projects/simplehtmldom/files/
http://davidwalsh.name/php-notifications
Here is what I have found in case anyone else needs it also!
$url = "http://example.com/test.html";
$searchFor = "example.com"
$input = #file_get_contents($url) or die("Could not access file: $url");
$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>";
if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {
echo $match[2];
if ($match[2] == $searchFor)
{
$isMatch = 1;
} else {
$isMatch= 0;
}
// $match[0] = A tag
// $match[2] = link address
// $match[3] = link text
}
}
if ($isMatch)
{
echo "<p><font color=red size=5 align=center>The page specified does contain your link. You have been credited the award amount!</font></p>";
} else {
echo "<p><font color=red size=5 align=center>The specified page does not have your referral link.</font></p>";
}

PHP: checking if YT url is valid and if video exists

The function purpose is to validate the URLs of a YouTube video and check if the video exists. This is a snippet of my actual code. I manipulate the string to my desired format and then i proceed to check if it is valid and exists. If it passes the test, then i echo the results. The problem is that I am not calling the function correctly.
I am getting this echo even though the video does exist:
The video does not exist or invalid url
Edited: and added isValidURL function
*Code for checking if video exist or is invalid:*
if($_POST)
{
// After applying url manipulation and getting the url in a proper format result = $formatted_url
function isValidURL($formatted_url) {
$formatted_url = trim($formatted_url);
$isValid = true;
if (strpos($formatted_url, 'http://') === false && strpos($formatted_url, 'https://') === false) {
$formatted_url = 'http://'.$formatted_url;
}
//first check with php's FILTER_VALIDATE_URL
if (filter_var($formatted_url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED) === false) {
$isValid = false;
} else {
//not all invalid URLs are caught by FILTER_VALIDATE_URL
//use our own mechanism
$host = parse_url($formatted_url, PHP_URL_HOST);
$dotcount = substr_count($host, '.');
//the host should contain at least one dot
if ($dotcount > 0) {
//if the host contains one dot
if ($dotcount == 1) {
//and it start with www.
if (strpos($host, 'www.') === 0) {
//there is no top level domain, so it is invalid
$isValid = false;
}
} else {
//the host contains multiple dots
if (strpos($host, '..') !== false) {
//dots can't be next to each other, so it is invalid
$isValid = false;
}
}
} else {
//no dots, so it is invalid
$isValid = false;
}
}
//return false if host is invalid
//otherwise return true
return $isValid;
}
$isValid = getYoutubeVideoID($formatted_url);
function isYoutubeVideo($formatted_url) {
$isValid = false;
//validate the url, see: http://snipplr.com/view/50618/
if (isValidURL($formatted_url)) {
//code adapted from Moridin: http://snipplr.com/view/19232/
$idLength = 11;
$idOffset = 3;
$idStarts = strpos($formatted_url, "?v=");
if ($idStarts !== FALSE) {
//there is a videoID present, now validate it
$videoID = substr($formatted_url, $idStarts + $idOffset, $idLength);
$http = new HTTP("http://gdata.youtube.com");
$result = $http->doRequest("/feeds/api/videos/".$videoID, "GET");
//returns Array('headers' => Array(), 'body' => String);
$code = $result['headers']['http_code'];
//did the request return a http code of 2xx?
if (substr($code, 0, 1) == 2) {
$isValid = true;
}
}
}
return $isValid;
}
$isValid = isYoutubeVideo($formatted_url);
parse_str($parsed_url['query'], $parsed_query_string);
$v = $parsed_query_string['v'];
if ( $isValid == true ) {
//Iframe code
echo htmlentities ('<iframe src="http://www.youtube.com/embed/'.$v.'" frameborder="0" width="'.$wdth.'" height="'.$hth.'"></iframe>');
//Old way to embed code
echo htmlentities ('<embed src="http://www.youtube.com/v/'.$v.'" width="'.$wdth.'" height="'.$hth.'" type="application/x-shockwave-flash" wmode="transparent" embed="" /></embed>');
}
else {
echo ("The video does not exist or invalid url");
}
}
?>
You are missing the isValidURL() function. Try changing this line:
if (isValidURL($formatted_url)) {
to
if(preg_match('/http:\/\/www\.youtube\.com\/watch\?v=[^&]+/', $formatted_url, $result)) {
or
$test = parse_url($formatted_url);
if($test['host']=="www.youtube.com"){

Don't check the domain name only

Soooo it's me again with this function..
I have the function working.
function http_file_exists($url)
{
$f = #fopen($url,"r");
if($f)
{
fclose($f);
return true;
}
return false;
}
And this is the usage :
if ($submit || $preview || $refresh)
{
$post_data['your_url'] = "http://www.google.com/this"; //remove the equals and url value if using in real post
$your_url = $post_data['your_url'];
$your_url_exists = (isset($your_url)) ? true : false;
$your_url = preg_replace(array('#&\#46;#','#&\#58;#','/\[(.*?)\]/'), array('.',':',''), $your_url);
if ($your_url_exists && http_file_exists($your_url) == true)
{
trigger_error('exists!');
}
How do I let it check the whole url and not the domain name only ? for example http://www.google.com/this
url tested is http://www.google.com/abadurltotest
source of code below = What is the fastest way to determine if a URL exists in PHP?
function http_file_exists($url)
{
//$url = preg_replace(array('#&\#46;#','#&\#58;#','/\[(.*?)\]/'), array('.',':',''), $url);
$url_data = parse_url ($url);
if (!$url_data) return FALSE;
$errno="";
$errstr="";
$fp=0;
$fp=fsockopen($url_data['host'],80,$errno,$errstr,30);
if($fp===0) return FALSE;
$path ='';
if (isset( $url_data['path'])) $path .= $url_data['path'];
if (isset( $url_data['query'])) $path .= '?' .$url_data['query'];
$out="GET /$path HTTP/1.1\r\n";
$out.="Host: {$url_data['host']}\r\n";
$out.="Connection: Close\r\n\r\n";
fwrite($fp,$out);
$content=fgets($fp);
$code=trim(substr($content,9,4)); //get http code
fclose($fp);
// if http code is 2xx or 3xx url should work
return ($code[0] == 2 || $code[0] == 3) ? TRUE : FALSE;
}
add the top code to functions_posting.php replacing previous function
if ($submit || $preview || $refresh)
{
$post_data['your_url'] = " http://www.google.com/abadurltotest";
$your_url = $post_data['your_url'];
$your_url_exists = (request_var($your_url, '')) ? true : false;
$your_url = preg_replace(array('#&\#46;#','#&\#58;#','/\[(.*?)\]/'), array('.',':',''), $your_url);
if ($your_url_exists === true && http_file_exists($your_url) === false)
{
trigger_error('A bad url was entered, Please push the browser back button and try again.');
}
Use curl and check the HTTP status code. if it's not 200 - most likely the url doesn't exist or inaccessible.
also note that
$your_url_exists = (isset($your_url)) ? true : false;
makes no sense. It seems you want
$your_url_exists = (bool)$your_url;
or just check $your_url instead of $your_url_exists

How to parse a youtube url to obtain the video or playlist ids?

I'm looking for a way to extract both (partials) youtube urls and single ids from a user input string.
This article How do I find all YouTube video ids in a string using a regex? got me going quite well but still i'm struggling a bit.
Is there a way to find both playlist and/or video ids from a strings from:
E4uySuFiCis
PLBE0103048563C552
Through:
?v=4OfUVmfNk4E&list=PLBE0103048563C552&index=5
http://www.youtube.com/watch?v=4OfUVmfNk4E&list=PLBE0103048563C552&index=5
use:
$urlInfo = parse_url($url); // to get url components (scheme:host:query)
$urlVars = array();
parse_str($queryString, $urlVars); // to get the query vars
check out the youtube api for more details on the format
I wrote a script to do this once where the YouTube URL is posted via "POST" under the key "l" (lowercase "L").
Unfortunately I never got round to incorporating it into my project so it's not been extensively tested to see how it does. If it fails it calls invalidURL with the URL as a parameter, if it succeeds it calls validURL with the ID from the URL.
This script may not be exactly what you're after because it ONLY retrieves the ID of the video currently playing - but you should be able to modify it easily.
if (isset($_POST['l'])) {
$ytIDLen = 11;
$link = $_POST['l'];
if (preg_match('|^http(s)?://[a-z0-9-]+(.[a-z0-9-]+)*(:[0-9]+)?(/.*)?$|i', $link)) {
$urlParts = parse_url($link);
//$scheme
//$host
//$path
//$query["v"]
if (isset($urlParts["scheme"])) {
if ( ($urlParts["scheme"] == "http" ) || ($urlParts["scheme"] == "https") ) {
//$scheme = "http";
} else invalidURL($link);
} //else $scheme = "http";
if (isset($urlParts["host"])) {
if ( ($urlParts["host"] == "www.youtube.com") || ($urlParts["host"] == "www.youtube.co.uk") || ($urlParts["host"] == "youtube.com") || ($urlParts["host"] == "youtube.co.uk")) {
//$host = "www.youtube.com";
if (isset($urlParts["path"])) {
if ($urlParts["path"] == "/watch") {
//$path = "/watch";
if (isset($urlParts["query"])) {
$query = array();
parse_str($urlParts["query"],$query);
if (isset($query["v"])) {
$query["v"] = preg_replace("/[^a-zA-Z0-9\s]/", "", $query["v"]);
if (strlen($query["v"]) == $ytIDLen) {
validUrl($query["v"]);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
} else invalidURL($link);
}

Categories