Remove parts of a string with PHP - php

I have an input box that tells uers to enter a link from imgur.com
I want a script to check the link is for the specified site but I'm not sue how to do it?
The links are as follows: http://i.imgur.com/He9hD.jpg
Please note that after the /, the text may vary e.g. not be a jpg but the main domain is always http://i.imgur.com/.
Any help appreciated.
Thanks, Josh.(Novice)

Try parse_url()
try {
if (!preg_match('/^(https?|ftp)://', $_POST['url']) AND !substr_count($_POST['url'], '://')) {
// Handle URLs that do not have a scheme
$url = sprintf("%s://%s", 'http', $_POST['url']);
} else {
$url = $_POST['url'];
}
$input = parse_url($url);
if (!$input OR !isset($input['host'])) {
// Either the parsing has failed, or the URL was not absolute
throw new Exception("Invalid URL");
} elseif ($input['host'] != 'i.imgur.com') {
// The host does not match
throw new Exception("Invalid domain");
}
// Prepend URL with scheme, e.g. http://domain.tld
$host = sprintf("%s://%s", $input['scheme'], $input['host']);
} catch (Exception $e) {
// Handle error
}

substr($input, 0, strlen('http://i.imgur.com/')) === 'http://i.imgur.com/'

Check this, using stripos
if(stripos(trim($url), "http://i.imgur.com")===0){
// the link is from imgur.com
}

Try this:
<?php
if(preg_match('#^http\:\/\/i\.imgur.com\/#', $_POST['url']))
echo 'Valid img!';
else
echo 'Img not valid...';
?>
Where $_POST['url'] is the user input.
I haven't tested this code.

$url_input = $_POST['input_box_name'];
if ( strpos($url_input, 'http://i.imgur.com/') !== 0 )
...

Several ways of doing it.. Here's one:
if ('http://i.imgur.com/' == substr($link, 0, 19)) {
...
}

Related

PHP Strip domain name from url

I know there is a LOT of info on the web regarding to this subject but I can't seem to figure it out the way I want.
I'm trying to build a function which strips the domain name from a url:
http://blabla.com blabla
www.blabla.net blabla
http://www.blabla.eu blabla
Only the plain name of the domain is needed.
With parse_url I get the domain filtered but that is not enough.
I have 3 functions that stips the domain but still I get some wrong outputs
function prepare_array($domains)
{
$prep_domains = explode("\n", str_replace("\r", "", $domains));
$domain_array = array_map('trim', $prep_domains);
return $domain_array;
}
function test($domain)
{
$domain = explode(".", $domain);
return $domain[1];
}
function strip($url)
{
$url = trim($url);
$url = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
$url = preg_replace("/\/.*$/is" , "" ,$url);
return $url;
}
Every possible domain, url and extension is allowed. After the function is finished, it must return a array of only the domain names itself.
UPDATE:
Thanks for all the suggestions!
I figured it out with the help from you all.
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}
How about
$wsArray = explode(".",$domain); //Break it up into an array.
$extension = array_pop($wsArray); //Get the Extension (last entry)
$domain = array_pop($wsArray); // Get the domain
http://php.net/manual/en/function.array-pop.php
Ah, your problem lies in the fact that TLDs can be either in one or two parts e.g .com vs .co.uk.
What I would do is maintain a list of TLDs. With the result after parse_url, go over the list and look for a match. Strip out the TLD, explode on '.' and the last part will be in the format you want it.
This does not seem as efficient as it could be but, with TLDs being added all the time, I cannot see any other deterministic way.
Ok...this is messy and you should spend some time optimizing and caching previously derived domains. You should also have a friendly NameServer and the last catch is the domain must have a "A" record in their DNS.
This attempts to assemble the domain name in reverse order until it can resolve to a DNS "A" record.
At anyrate, this was bugging me, so I hope this answer helps :
<?php
$wsHostNames = array(
"test.com",
"http://www.bbc.com/news/uk-34276525",
"google.uk.co"
);
foreach ($wsHostNames as $hostName) {
echo "checking $hostName" . PHP_EOL;
$wsWork = $hostName;
//attempt to strip out full paths to just host
$wsWork = parse_url($hostName, PHP_URL_HOST);
if ($wsWork != "") {
echo "Was able to cleanup $wsWork" . PHP_EOL;
$hostName = $wsWork;
} else {
//Probably had no path info or malformed URL
//Try to check it anyway
echo "No path to strip from $hostName" . PHP_EOL;
}
$wsArray = explode(".", $hostName); //Break it up into an array.
$wsHostName = "";
//Build domain one segment a time probably
//Code should be modified not to check for the first segment (.com)
while (!empty($wsArray)) {
$newSegment = array_pop($wsArray);
$wsHostName = $newSegment . $wsHostName;
echo "Checking $wsHostName" . PHP_EOL;
if (checkdnsrr($wsHostName, "A")) {
echo "host found $wsHostName" . PHP_EOL;
echo "Domain is $newSegment" . PHP_EOL;
continue(2);
} else {
//This segment didn't resolve - keep building
echo "No Valid A Record for $wsHostName" . PHP_EOL;
$wsHostName = "." . $wsHostName;
}
}
//if you get to here in the loop it could not resolve the host name
}
?>
try with preg_replace.
something like
$domain = preg_replace($regex, '$1', $url);
regex
function test($url)
{
// Check if the url begins with http:// www. or both
// If so, replace it
if (preg_match("/^(http:\/\/|www.)/i", $url))
{
$domain = preg_replace("/^(http:\/\/)*(www.)*/is", "", $url);
}
else
{
$domain = $url;
}
// Now all thats left is the domain and the extension
// Only return the needed first part without the extension
$domain = explode(".", $domain);
return $domain[0];
}

Add http or https for user input validation

$xml = $_GET['url']
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
..
..
if the user put without http or https my script will be broken, is concatenation a good way to validation in this case?
The simplest way of doing this is checking for the presence of http:// or https:// at the beginning of the string.
if (preg_match('/^http(s)?:\/\//', $xml, $matches) === 1) {
if ($matches[1] === 's') {
// it's https
} else {
// it's http
}
} else {
// there is neither http nor https at the beginning
}
You are using a get method. Or this is done by AJAX, or the user appends a url in the querystring You are not posting a form?
Concatenation isn't going to cut it, when the url is faulty. You need to check for this.
You can put an input with placeholder on the page, to "force" the user to use http://. This should be the way to go in HTML5.
<input type="text" pattern="^(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$" placeholder="http://" title="URLs need to be proceeded by http:// or https://" >
This should check and forgive some errors. If an url isn't up to spec this will return an error, as it should. The user should revise his url.
$xml = $_GET['url']
$xmlDoc = new DOMDocument();
if (!preg_match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/, $xml ) )
{
echo 'This url is not valid.';
exit;
}
else if (!preg_match(/^http(s)?:\/\/, $xml))
{
//no http present
$orgUrl = $xml;
$xml = "http://".$orgUrl;
//extended to cope with https://
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
//this attempt failed.
$xml = "https://".$orgUrl;
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
echo substr($loaded, 6);
exit;
}
}
}
else
{
$loaded = loadXML();
}
function loadXML()
{
try {
return $xmlDoc->load($xml);
}
catch($Ex)
{
return echo 'false Your url could\'t be retrieved. Are you sure you\'ve entered it correctly?';
}
}
You can also use curl to check the url before loading xml:
$ch = curl_init($xml);
// Send request
curl_exec($ch);
// Check for errors and display the error message
if($errno = curl_errno($ch)) {
$error_message = curl_strerror($errno);
echo "$error_message :: while loading url";
}
// Close the handle
curl_close($ch);
Important side-note: Using this methods to check if the url is available and than take the appropriate action can take a very long time, since the server response can take a while to return.

PHP read content of url with japanese word

Hi I want to read content of web url having Japanese word in it.
My existing code is as below
$url = "http://fantasticlife稼ぐ777.tokyo" ;
$responseText = "";
try {
$responseText = #file_get_contents($url);
var_dump($responseText);
} catch (\Exception $e) {
echo $e->getMessage();
}
I am getting following output.
bool(false)
My concern is where the things went wrong. Above code is working fine for normal urls.
Thanks in advance.
Thanks,
Done by converting domain name to IDNA ASCII form. idn_to_ascii() function. Code snippet is as below.
if (strpos($url,"http://")!== false){
$url = "http://" . idn_to_ascii(str_replace("http://", "",$url));
}else if(strpos($url,"https://")!== false){
$url = "https://" . idn_to_ascii(str_replace("https://", "",$url));
}else{
$url = idn_to_ascii($url);
}
Thanks once again. :)

validate url in php

I have the following seems simple code in php; but the ptoblem is that it shows all valid links as "not valid"; any help appreciated:
<?php
$m = "urllist.txt";
$n = fopen($m, "r");
while (!feof($n)) {
$l = fgets($n);
if (filter_var($l, FILTER_VALIDATE_URL) === FALSE) {
echo "NOT VALID - $l<br>";
} else {
echo "VALID - $l<br>";
}
}
fclose($n);
?>
The string returned by fgets() contains a trailing newline character that needs to be trimmed before you can validate it. Try out following code, I hope this will help you:
<?php
$m = "urllist.txt";
$n = fopen($m, "r");
while (!feof($n)) {
$l = fgets($n);
if(filter_var(trim($l), FILTER_VALIDATE_URL)) {
echo "VALID - $l<br>";
} else {
echo "NOT VALID - $l<br>";
}
}
fclose($n);
?>
I have tried with following urls:
http://stackoverflow.com/
https://www.google.co.in/
https://www.google.co.in/?gfe_rd=cr&ei=bf4HVLOmF8XFoAOg_4HoCg&gws_rd=ssl
www.google.com
http://www.example.com
example.php?name=Peter&age=37
and get following result:
VALID - http://stackoverflow.com/
VALID - https://www.google.co.in/
VALID - https://www.google.co.in/?gfe_rd=cr&ei=bf4HVLOmF8XFoAOg_4HoCg&gws_rd=ssl
NOT VALID - www.google.com
VALID - http://www.example.com
NOT VALID - example.php?name=Peter&age=37
maybe you have some symbols at end of each line '\n'
I think you can just use trim function before validate the $l like this:
filter_var(trim($l), FILTER_VALIDATE_URL) === TRUE
maybe this will help you.
Please try with the different filters available to see where it fails:
FILTER_FLAG_SCHEME_REQUIRED - Requires URL to be an RFC compliant URL
(like http:// example)
FILTER_FLAG_HOST_REQUIRED - Requires URL to
include host name (like http:// www.example.com)
FILTER_FLAG_PATH_REQUIRED - Requires URL to have a path after the
domain name (like www. example.com/example1/test2/)
FILTER_FLAG_QUERY_REQUIRED - Requires URL to have a query string
(like "example.php?name=Peter&age=37")
(cc of http://www.w3schools.com/php/filter_validate_url.asp)
You can try the good old regex too:
if (!preg_match("/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&##\/%?=~_|!:,.;]*[-a-z0-9+&##\/%=~_|]/i",$url))
Try this code. It must be helpful. I have tested it and its working.
<?php
$m = "urllist.txt";
$n = fopen($m, "r");
while (!feof($n)) {
$l = fgets($n);
if(filter_var(trim($l), FILTER_VALIDATE_URL)) {
echo "URL is not valid";
}
else{
echo "URL is valid";
}
}
fclose($n);
?>
Here is the DEMO

PHP: Array comparison issue- URLs

I have put together php code that will notify if a YT video is valid or invalid. Since I only want URLs from the YT domain I have set an array to catch any other URL case and return an error. The problem is that when I type a URL in a format like: www.youtube.com/v/NLqAF9hrVbY i get the last echo error but when I add the http in front of that URL it works find. I am checking the $url with PHP_URL_HOST .
Why is it not accepting URLs of the allowed domain without the http?
PHP
if ($_POST) {
$url = $_POST['name'];
if (!empty($url['name'])) {
$allowed_domains = array(
'www.youtube.com',
'gdata.youtube.com',
'youtu.be',
'youtube.com',
'm.youtube.com'
);
if (in_array(parse_url($url, PHP_URL_HOST), $allowed_domains)) {
$formatted_url = getYoutubeVideoID($url);
$parsed_url = parse_url($formatted_url);
parse_str($parsed_url['query'], $parsed_query_string);
$videoID = $parsed_query_string['v'];
$headers = get_headers('http://gdata.youtube.com/feeds/api/videos/' . $videoID);
if ($videoID != null) {
if (strpos($headers[0], '200')) {
echo('<div id="special"><span id="resultval">That is a valid youtube video</span></div>');
}
else {
echo('<div id="special"><span id="resultval">The YouTube video does not exist.</span></div>');
return false;
}
}
{
echo('<div id="special"><span id="resultval">The YouTube video does not exist.</span></div>');
return false;
}
}
else {
echo ('<div id="special"><span id="resultval">Please include a video URL from Youtube.</span></div>');
}
?>
parse_url() needs to be given valid URLs with protocol identifier (scheme - e.g. http) present. This is why the comparison fails.
You can fix this as follows.
if(substr($url, 0, 4) != 'http')
$url = "http://".$url;
Use the above before performing
if(in_array(parse_url($url, PHP_URL_HOST), $allowed_domains)){ ... }

Categories