$xml = $_GET['url']
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
..
..
if the user put without http or https my script will be broken, is concatenation a good way to validation in this case?
The simplest way of doing this is checking for the presence of http:// or https:// at the beginning of the string.
if (preg_match('/^http(s)?:\/\//', $xml, $matches) === 1) {
if ($matches[1] === 's') {
// it's https
} else {
// it's http
}
} else {
// there is neither http nor https at the beginning
}
You are using a get method. Or this is done by AJAX, or the user appends a url in the querystring You are not posting a form?
Concatenation isn't going to cut it, when the url is faulty. You need to check for this.
You can put an input with placeholder on the page, to "force" the user to use http://. This should be the way to go in HTML5.
<input type="text" pattern="^(https?:\/\/)([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$" placeholder="http://" title="URLs need to be proceeded by http:// or https://" >
This should check and forgive some errors. If an url isn't up to spec this will return an error, as it should. The user should revise his url.
$xml = $_GET['url']
$xmlDoc = new DOMDocument();
if (!preg_match(/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/, $xml ) )
{
echo 'This url is not valid.';
exit;
}
else if (!preg_match(/^http(s)?:\/\/, $xml))
{
//no http present
$orgUrl = $xml;
$xml = "http://".$orgUrl;
//extended to cope with https://
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
//this attempt failed.
$xml = "https://".$orgUrl;
$loaded = loadXML();
if (substr($loaded, 0, 5) == "false")
{
echo substr($loaded, 6);
exit;
}
}
}
else
{
$loaded = loadXML();
}
function loadXML()
{
try {
return $xmlDoc->load($xml);
}
catch($Ex)
{
return echo 'false Your url could\'t be retrieved. Are you sure you\'ve entered it correctly?';
}
}
You can also use curl to check the url before loading xml:
$ch = curl_init($xml);
// Send request
curl_exec($ch);
// Check for errors and display the error message
if($errno = curl_errno($ch)) {
$error_message = curl_strerror($errno);
echo "$error_message :: while loading url";
}
// Close the handle
curl_close($ch);
Important side-note: Using this methods to check if the url is available and than take the appropriate action can take a very long time, since the server response can take a while to return.
Related
so I tried to get a fix for this earlier but I think we were all going in the wrong direction. I'm trying to check two servers to make sure that at least one of them are active to make a call to. The service provides me with a page for each that simply has "OK" under a div with id="server_status". When I try to loadHTMLFile into a variable, it returns true, but I can never pull the element I need from it. After doing some output testing with saveHTML(), it appears that the variable holding the DOMDocument is empty. Here's my code:
servers = array('tpeweb.paybox.com', // primary URL
'tpeweb1.paybox.com'); // backup URL
foreach($servers as $server){
$doc = new DOMDocument();
$doc->validateOnParse = true;
$doc->loadHTMLFile('https://'.$server.'/load.html');
$server_status = "";
$docText = $doc->saveHTML();
if($doc) {
echo "HTML should output here: ";
echo $docText;
}
if(!$doc) {
echo "HTML file not loaded";
}
$element = $doc->getElementById('server_status');
if($element){
$server_status = $element->textContent;
}
if($server_status == "OK"){
// Server is up and services are available
return array(true, 'https://'.$server.'/cgi/MYchoix_pagepaiement.cgi');
}
}
return array(false, 'e404.html');
All I get as output is "HTML should output here: " twice, and then it returns the array at the bottom. This is the code that they provided:
$servers = array('tpeweb.paybox.com', // primary URL
'tpeweb1.paybox.com'); // backup URL
$serverOK = "";
foreach($servers as $server){
$doc = new DOMDocument();
$doc->loadHTMLFile('https://'.$server.'/load.html');
$server_status = "";
$element = $doc->getElementById('server_status');
if($element){
$server_status = $element->textContent;
}
if($server_status == "OK"){
// Server is up and services are available
$serverOK = $server;
break;
}
// else : Server is up but services are not available .
}
if(!$serverOK){
die("Error : no server found");
}
echo 'Connecting to https://'.$server.'/cgi/MYchoix_pagepaiement.cgi';
This also seems to be having the same problem. Could it be something with my PHP configuration? I'm on version 5.3.6.
Thanks,
Adrian
EDIT:
I tried it by inputting the HTML as a string instead of calling it to the server and it worked fine. However, calling the HTML into a string to use in the PHP function results in the same issue. Fixes??
I have some code to get some public available data that i am fetching from a website
//Array of params
foreach($params as $par){
$html = file_get_html('WEBSITE.COM/$par');
$name = $html->find('div[class=name]');
$link = $html->find('div[class=secondName]');
foreach($link as $i => $result2)
{
$var = $name[$i]->plaintext;
echo $result2->href,"<br>";
//Insert to database
}
}
So it goes to the given website with a different parameter in the URL each time on the loop, i keep getting errors that breaks the script when a 404 comes up or a server temporarily unavailable. I have tried code to check the headers and check if the $html is an object first but i still get the errors, is there a way i can just skip the errors and leave them out and carry on with the script?
Code i have tried to checked headers
function url_exists($url){
if ((strpos($url, "http")) === false) $url = "http://" . $url;
$headers = #get_headers($url);
//print_r($headers);
if (is_array($headers)){
//Check for http error here....should add checks for other errors too...
if(strpos($headers[0], '404 Not Found'))
return false;
else
return true;
}
else
return false;
}
Code i have tried to check if object
if (method_exists($html,"find")) {
// then check if the html element exists to avoid trying to parse non-html
if ($html->find('html')) {
// and only then start searching (and manipulating) the dom
You need to be more specific, what kind of errors are you getting? Which line errors out?
Edit: Since you did specify the errors you're getting, here's what to do:
I've noticed you're using SINGLE quotes with a string that contains variables. This won't work, use double quotes instead, i.e.:
$html = file_get_html("WEBSITE.COM/$par");
Perhaps this is the issue?
Also, you could use file_get_contents()
if (file_get_contents("WEBSITE.COM/$par") !== false) {
...
}
so I'm grabbing some information from an XML file like so:
$url = "http://myurl.blah";
$xml = simplexml_load_file($url);
Except sometimes the XML file is empty and I need the code to fail gracefully but I can't seem to figure out how to catch the PHP error. I tried this:
if(isset(simplexml_load_file($url)));
{
$xml = simplexml_load_file($url);
/*rest of code using $xml*/
}
else {
echo "No info avilable.";
}
But it doesn't work. I guess you can't use ISSET that way. Anyone know how to catch the error?
$xml = file_get_contents("http://myurl.blah");
if (trim($xml) == '') {
die('No content');
}
$xml = simplexml_load_string($xml);
Or, possibly slightly more efficient, but not necessarily recommended because it silences errors:
$xml = #simplexml_load_file($url);
if (!$xml) {
die('error');
}
Don't use isset here.
// Shutdown errors (I know it's bad)
$xml = #simplexml_load_file($url);
// Check you have fetch a response
if (false !== $xml); {
//rest of code using $xml
} else {
echo "No info avilable.";
}
if (($xml = simplexml_load_file($url)) !== false) {
// Everything is OK. Use $xml object.
} else {
// Something has gone wrong!
}
From PHP manual, error handling (click here):
var_dump(libxml_use_internal_errors(true));
// load the document
$doc = new DOMDocument;
if (!$doc->load('file.xml')) {
foreach (libxml_get_errors() as $error) {
// handle errors here
}
libxml_clear_errors();
}
I have an input box that tells uers to enter a link from imgur.com
I want a script to check the link is for the specified site but I'm not sue how to do it?
The links are as follows: http://i.imgur.com/He9hD.jpg
Please note that after the /, the text may vary e.g. not be a jpg but the main domain is always http://i.imgur.com/.
Any help appreciated.
Thanks, Josh.(Novice)
Try parse_url()
try {
if (!preg_match('/^(https?|ftp)://', $_POST['url']) AND !substr_count($_POST['url'], '://')) {
// Handle URLs that do not have a scheme
$url = sprintf("%s://%s", 'http', $_POST['url']);
} else {
$url = $_POST['url'];
}
$input = parse_url($url);
if (!$input OR !isset($input['host'])) {
// Either the parsing has failed, or the URL was not absolute
throw new Exception("Invalid URL");
} elseif ($input['host'] != 'i.imgur.com') {
// The host does not match
throw new Exception("Invalid domain");
}
// Prepend URL with scheme, e.g. http://domain.tld
$host = sprintf("%s://%s", $input['scheme'], $input['host']);
} catch (Exception $e) {
// Handle error
}
substr($input, 0, strlen('http://i.imgur.com/')) === 'http://i.imgur.com/'
Check this, using stripos
if(stripos(trim($url), "http://i.imgur.com")===0){
// the link is from imgur.com
}
Try this:
<?php
if(preg_match('#^http\:\/\/i\.imgur.com\/#', $_POST['url']))
echo 'Valid img!';
else
echo 'Img not valid...';
?>
Where $_POST['url'] is the user input.
I haven't tested this code.
$url_input = $_POST['input_box_name'];
if ( strpos($url_input, 'http://i.imgur.com/') !== 0 )
...
Several ways of doing it.. Here's one:
if ('http://i.imgur.com/' == substr($link, 0, 19)) {
...
}
how to detect favicon (shortcut icon) for any site via php ?
i cant write regexp because is different in sites..
You could use this address and drop this into a regexp
http://www.google.com/s2/favicons?domain=www.example.com
This addresses the problem you were having with Regexp and the different results per domain
You can request http://domain.com/favicon.ico with PHP and see if you get a 404.
If you get a 404 there, you can pass the website's DOM, looking for a different location as referenced in the head element by the link element with rel="icon".
// Helper function to see if a url returns `200 OK`.
function $resourceExists($url) {
$headers = get_headers($request);
if ( ! $headers) {
return FALSE;
}
return (strpos($headers[0], '200') !== FALSE);
}
function domainHasFavicon($domain) {
// In case they pass 'http://example.com/'.
$request = rtrim($domain, '/') . '/favicon.ico';
// Check if the favicon.ico is where it usually is.
if (resourceExists($request)) {
return TRUE;
} else {
// If not, we'll parse the DOM and find it
$dom = new DOMDocument;
$dom->loadHTML($domain);
// Get all `link` elements that are children of `head`
$linkElements = $dom
->getElementsByTagName('head')
->item(0)
->getElementsByTagName('link');
foreach($linkElements as $element) {
if ( ! $element->hasAttribute('rel')) {
continue;
}
// Split the rel up on whitespace separated because it can have `shortcut icon`.
$rel = preg_split('/\s+/', $element->getAttribute('rel'));
if (in_array('link', $rel)) {
$href = $element->getAttribute('href');
// This may be a relative URL.
// Let's assume http, port 80 and Apache
$url = 'http://' . $_SERVER['SERVER_NAME'] . $_SERVER['REQUEST_URI'];
if (substr($href, 0, strlen($url)) !== $url) {
$href = $url . $href;
}
return resourceExists($href);
}
}
return FALSE;
}
If you want the URL returned to the favicon.ico, it is trivial to modify the above function.
$address = 'http://www.youtube.com/'
$domain = parse_url($address, PHP_URL_HOST);
or from a database
$domain = parse_url($row['address_column'], PHP_URL_HOST);
display with
<image src="http://www.google.com/s2/favicons?domain='.$domain.'" />