PHP: Array comparison issue- URLs - php

I have put together php code that will notify if a YT video is valid or invalid. Since I only want URLs from the YT domain I have set an array to catch any other URL case and return an error. The problem is that when I type a URL in a format like: www.youtube.com/v/NLqAF9hrVbY i get the last echo error but when I add the http in front of that URL it works find. I am checking the $url with PHP_URL_HOST .
Why is it not accepting URLs of the allowed domain without the http?
PHP
if ($_POST) {
$url = $_POST['name'];
if (!empty($url['name'])) {
$allowed_domains = array(
'www.youtube.com',
'gdata.youtube.com',
'youtu.be',
'youtube.com',
'm.youtube.com'
);
if (in_array(parse_url($url, PHP_URL_HOST), $allowed_domains)) {
$formatted_url = getYoutubeVideoID($url);
$parsed_url = parse_url($formatted_url);
parse_str($parsed_url['query'], $parsed_query_string);
$videoID = $parsed_query_string['v'];
$headers = get_headers('http://gdata.youtube.com/feeds/api/videos/' . $videoID);
if ($videoID != null) {
if (strpos($headers[0], '200')) {
echo('<div id="special"><span id="resultval">That is a valid youtube video</span></div>');
}
else {
echo('<div id="special"><span id="resultval">The YouTube video does not exist.</span></div>');
return false;
}
}
{
echo('<div id="special"><span id="resultval">The YouTube video does not exist.</span></div>');
return false;
}
}
else {
echo ('<div id="special"><span id="resultval">Please include a video URL from Youtube.</span></div>');
}
?>

parse_url() needs to be given valid URLs with protocol identifier (scheme - e.g. http) present. This is why the comparison fails.
You can fix this as follows.
if(substr($url, 0, 4) != 'http')
$url = "http://".$url;
Use the above before performing
if(in_array(parse_url($url, PHP_URL_HOST), $allowed_domains)){ ... }

Related

PHP Strip domain name from url

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.
I'm trying to build a function which strips the domain name from a url:
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
Only the plain name of the domain is needed.
With parse_url I get the domain filtered but that is not enough.
I have 3 functions that stips the domain but still I get some wrong outputs
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.
UPDATE:
Thanks for all the suggestions!
I figured it out with the help from you all.
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
How about
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
http://php.net/manual/en/function.array-pop.php
Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.
What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.
This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.
Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.
This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.
At anyrate, this was bugging me, so I hope this answer helps :
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
try with preg_replace.
something like
$domain = preg_replace($regex, '$1', $url);
regex
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}

Add http or https for user input validation

$xml = $_GET['url']
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
..
..
if the user put without http or https my script will be broken, is concatenation a good way to validation in this case?
The simplest way of doing this is checking for the presence of http:// or https:// at the beginning of the string.
if (preg_match('/^http(s)?:\/\//', $xml, $matches) === 1) {
if ($matches[1] === 's') {
// it's https
} else {
// it's http
}
} else {
// there is neither http nor https at the beginning
}
You are using a get method. Or this is done by AJAX, or the user appends a url in the querystring You are not posting a form?
Concatenation isn't going to cut it, when the url is faulty. You need to check for this.
You can put an input with placeholder on the page, to "force" the user to use http://. This should be the way to go in HTML5.
<input type="text" pattern="^(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$" placeholder="http://" title="URLs need to be proceeded by http:// or https://" >
This should check and forgive some errors. If an url isn't up to spec this will return an error, as it should. The user should revise his url.
$xml = $_GET['url']
$xmlDoc = new DOMDocument();
if (!preg_match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/, $xml ) )
{
echo 'This url is not valid.';
exit;
}
else if (!preg_match(/^http(s)?:\/\/, $xml))
{
//no http present
$orgUrl = $xml;
$xml = "http://".$orgUrl;
//extended to cope with https://
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
//this attempt failed.
$xml = "https://".$orgUrl;
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
echo substr($loaded, 6);
exit;
}
}
}
else
{
$loaded = loadXML();
}
function loadXML()
{
try {
return $xmlDoc->load($xml);
}
catch($Ex)
{
return echo 'false Your url could\'t be retrieved. Are you sure you\'ve entered it correctly?';
}
}
You can also use curl to check the url before loading xml:
$ch = curl_init($xml);
// Send request
curl_exec($ch);
// Check for errors and display the error message
if($errno = curl_errno($ch)) {
$error_message = curl_strerror($errno);
echo "$error_message :: while loading url";
}
// Close the handle
curl_close($ch);
Important side-note: Using this methods to check if the url is available and than take the appropriate action can take a very long time, since the server response can take a while to return.

Validation of youtube video through url id

I am using php and yt API to determine if a video exists. I have implemented two functions two help me along this cause. The issues is with the function isYoutubeVideo is returning me null value when a video is valid or invalid. I have checked its paremeters but I am still not sure why is giving me null value. Do i have the isYoutubeVideo function set up wrong/missing something/needs change?
function isYoutubeVideo($formatted_url) {
$isValid = false;
if (isValidURL($formatted_url)) {
$idLength = 11;
$idStarts = strpos($formatted_url, "?v=");
if ($idStarts === FALSE) {
$idStarts = strpos($formatted_url, "&v=");
}
if ($idStarts !== FALSE) {
//there is a videoID present, now validate it
$v = substr($formatted_url, $idStarts, $idLength);
$headers = get_headers('http://gdata.youtube.com/feeds/api/videos/' . $videoID);
//did the request return a http code of 2xx?
if (!strpos($headers[0], '200')) {
$isValid = true;
}
}
}
return $isValid;
Why aren't your first two echo statements also echoing <span id="resultval"> ?;
With the current markup and jQuery code this is why jQuery populates #special with null.
You need to change you PHP code to:
echo('<div id="special"><span id="resultval">The YouTube video does not exist.</span></div>');
echo('<div id="special"><span id="resultval">that is a valid youtube video</span></div>');
Although, I do not understand why you used such a roundabout way to populate #special. You could have just started with a <p> inside <div id="#special"> and then use the selector $("#special > p")
Good luck!

Remove parts of a string with PHP

I have an input box that tells uers to enter a link from imgur.com
I want a script to check the link is for the specified site but I'm not sue how to do it?
The links are as follows: http://i.imgur.com/He9hD.jpg
Please note that after the /, the text may vary e.g. not be a jpg but the main domain is always http://i.imgur.com/.
Any help appreciated.
Thanks, Josh.(Novice)
Try parse_url()
try {
if (!preg_match('/^(https?|ftp)://', $_POST['url']) AND !substr_count($_POST['url'], '://')) {
// Handle URLs that do not have a scheme
$url = sprintf("%s://%s", 'http', $_POST['url']);
} else {
$url = $_POST['url'];
}
$input = parse_url($url);
if (!$input OR !isset($input['host'])) {
// Either the parsing has failed, or the URL was not absolute
throw new Exception("Invalid URL");
} elseif ($input['host'] != 'i.imgur.com') {
// The host does not match
throw new Exception("Invalid domain");
}
// Prepend URL with scheme, e.g. http://domain.tld
$host = sprintf("%s://%s", $input['scheme'], $input['host']);
} catch (Exception $e) {
// Handle error
}
substr($input, 0, strlen('http://i.imgur.com/')) === 'http://i.imgur.com/'
Check this, using stripos
if(stripos(trim($url), "http://i.imgur.com")===0){
// the link is from imgur.com
}
Try this:
<?php
if(preg_match('#^http\:\/\/i\.imgur.com\/#', $_POST['url']))
echo 'Valid img!';
else
echo 'Img not valid...';
?>
Where $_POST['url'] is the user input.
I haven't tested this code.
$url_input = $_POST['input_box_name'];
if ( strpos($url_input, 'http://i.imgur.com/') !== 0 )
...
Several ways of doing it.. Here's one:
if ('http://i.imgur.com/' == substr($link, 0, 19)) {
...
}

how to detect favicon (shortcut icon) for any site via php?

how to detect favicon (shortcut icon) for any site via php ?
i cant write regexp because is different in sites..
You could use this address and drop this into a regexp
http://www.google.com/s2/favicons?domain=www.example.com
This addresses the problem you were having with Regexp and the different results per domain
You can request http://domain.com/favicon.ico with PHP and see if you get a 404.
If you get a 404 there, you can pass the website's DOM, looking for a different location as referenced in the head element by the link element with rel="icon".
// Helper function to see if a url returns `200 OK`.
function $resourceExists($url) {
$headers = get_headers($request);
if ( ! $headers) {
return FALSE;
}
return (strpos($headers[0], '200') !== FALSE);
}
function domainHasFavicon($domain) {
// In case they pass 'http://example.com/'.
$request = rtrim($domain, '/') . '/favicon.ico';
// Check if the favicon.ico is where it usually is.
if (resourceExists($request)) {
return TRUE;
} else {
// If not, we'll parse the DOM and find it
$dom = new DOMDocument;
$dom->loadHTML($domain);
// Get all `link` elements that are children of `head`
$linkElements = $dom
->getElementsByTagName('head')
->item(0)
->getElementsByTagName('link');
foreach($linkElements as $element) {
if ( ! $element->hasAttribute('rel')) {
continue;
}
// Split the rel up on whitespace separated because it can have `shortcut icon`.
$rel = preg_split('/\s+/', $element->getAttribute('rel'));
if (in_array('link', $rel)) {
$href = $element->getAttribute('href');
// This may be a relative URL.
// Let's assume http, port 80 and Apache
$url = 'http://' . $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if (substr($href, 0, strlen($url)) !== $url) {
$href = $url . $href;
}
return resourceExists($href);
}
}
return FALSE;
}
If you want the URL returned to the favicon.ico, it is trivial to modify the above function.
$address = 'http://www.youtube.com/'
$domain = parse_url($address, PHP_URL_HOST);
or from a database
$domain = parse_url($row['address_column'], PHP_URL_HOST);
display with
<image src="http://www.google.com/s2/favicons?domain='.$domain.'" />

Categories